Extracting event logs via Process Mining API takes over 15 minutes and frequently times out with 504 Gateway Timeout errors when pulling data for processes with more than 50,000 events. We’re trying to build automated analytics reports that run daily, but the API performance is making this impractical.
The timeout occurs during large data pulls:
GET /api/v1/process-mining/events?processId=12345&limit=10000
Status: 504 Gateway Timeout
Error: "Request exceeded maximum execution time"
We’ve tried adjusting the limit parameter, but even with smaller page sizes the overall extraction time is excessive. The API documentation mentions pagination support and batch processing capabilities, but we’re unclear on the optimal configuration for large dataset extraction. Our timeout configuration seems standard, but perhaps there are specific settings for Process Mining API calls that we’re missing?
Have you looked into using filters to reduce the dataset size before extraction? Instead of pulling all 50,000 events, filter by date range, event type, or process instance status. Most analytics use cases don’t need the complete historical dataset every time. You could implement incremental extraction where you only pull events created or modified since the last successful extraction run.
For the timeout configuration, make sure you’re setting timeouts at multiple levels - the OutSystems HTTP client timeout, the API gateway timeout, and the Process Mining service timeout. They all need to be aligned. I typically set HTTP client timeout to 120 seconds, gateway timeout to 150 seconds, and ensure the backend service can complete within those limits. Also consider implementing retry logic with exponential backoff for transient timeout errors.
We reduced the limit to 2000 events per page, but now we’re making 25+ sequential API calls to get all the data, and the total time is still around 12-15 minutes. Is there a way to parallelize these requests or use batch processing to speed things up? The sequential pagination approach seems inefficient for our use case.
The 504 timeout with large datasets is common when you’re trying to pull too much data in a single request. Even though you’re using pagination with limit=10000, that’s still a lot of events to process in one call. Try reducing the page size to 1000-2000 events and implement proper pagination with offset or cursor-based navigation. Also check if your gateway timeout is set appropriately for data extraction operations.
Process Mining APIs often support bulk export operations separate from the standard pagination endpoints. Check if there’s an /export or /bulk endpoint that’s designed for large dataset extraction. These endpoints typically generate a data file asynchronously and provide a download link, which is much more efficient than paginating through thousands of events. You’d poll for completion status rather than waiting synchronously.