Non-conformance escalation workflow timing out during batch processing

We’re running TrackWise 10.0 and have a scheduled job that processes non-conformance escalations in batch mode every night. Recently, the workflow is timing out when processing more than 50 records. The escalation workflow moves NCs from ‘Open’ to ‘Escalated’ status and sends notifications. Individual escalations work fine, but batch processing fails with timeout errors. I’ve checked the workflow timeout configuration which is set to 300 seconds. The batch job runs asynchronously but seems to be hitting thread pool limits. We have about 200 NCs that need escalation each night.

Error log:


WorkflowExecutionException: Timeout after 300s
Batch job: NC_ESCALATION_BATCH
Processed: 47/203 records
Thread pool: 10/10 active threads

Have you considered optimizing the workflow itself? If each escalation sends individual notification emails, that’s 200 SMTP calls. Batch the notifications - collect all escalated NCs and send a single summary email per recipient instead of individual emails per NC. This dramatically reduces external I/O operations. Also, check if the workflow is loading unnecessary related data. Use lazy loading for associated documents and attachments that aren’t needed for escalation logic.

Your thread pool is maxed out. With only 10 threads and 200 records, each workflow execution is competing for resources. Increase the async workflow thread pool size in trackwise.properties. Also, consider breaking the batch into smaller chunks - process 50 records per job run instead of all 200 at once. This reduces memory pressure and prevents thread starvation.

Classic resource contention issue. Your database connection pool needs to be at least 2x your workflow thread pool size. If you have 20 workflow threads, set db.pool.maxConnections to at least 40. Each workflow may use multiple connections for queries, updates, and audit logging. Also enable connection pool monitoring to see actual usage patterns during batch processing.

The 300 second timeout is too aggressive for batch operations. Each workflow execution includes database transactions, notification processing, and state validation - this adds up quickly in batch mode. I’d recommend increasing the timeout to at least 600 seconds for batch workflows. Also check if your escalation workflow has any synchronous email sends - these should be asynchronous to avoid blocking the workflow thread while waiting for SMTP responses.

I increased the thread pool to 20 and the timeout to 600 seconds, but now I’m seeing database connection pool exhaustion errors. It seems like increasing threads just moved the bottleneck. Each workflow execution must be holding database connections longer than expected. How should I balance thread pool size with database connection pool?