We’re experiencing significant delays in our audit escalation workflows tied to batch job execution. Our audit management module runs scheduled batch jobs to check for overdue audits and trigger escalation workflows, but these jobs are taking 45-60 minutes to complete when they should finish in under 15 minutes.
The issue impacts our compliance SLAs - audits that should escalate within 2 hours of becoming overdue are sometimes delayed by 3-4 hours. We have about 1,200 active audits across multiple sites, and the batch job processes all of them sequentially.
I’m wondering if there are tuning options for batch job frequency, or if we can enable some form of parallel processing to speed this up. Also curious if certain workflow steps might be bottlenecks that we could optimize. Has anyone dealt with similar batch job latency in audit management?
The sequential processing is definitely your bottleneck with 1,200 audits. Veeva introduced parallel batch processing capabilities in 23R2 that you should leverage. Check if your workflow steps are configured for parallel execution - specifically the audit status evaluation and notification generation steps. These can run concurrently across different audit records rather than one-by-one. Also review your workflow entry criteria - you might be processing audits that don’t actually need escalation checks, wasting cycles. Filter more aggressively at the entry point to reduce the processing load. Have you looked at the batch job execution logs to see which specific step is consuming the most time?
Notification bundling is key here. Instead of sending individual notifications for each audit escalation, configure your workflow to collect all escalated audits in a batch and send consolidated notifications. You can do this by adding a delay step that collects audit IDs over a 15-minute window, then triggers a single notification action with all relevant audits listed. This reduces notification overhead by 70-80% in high-volume scenarios. Also ensure your notification templates aren’t making redundant data queries - pre-fetch required audit details at the workflow entry point and pass them as workflow variables.
You need to implement a three-pronged optimization approach addressing batch frequency, parallel processing, and workflow efficiency.
Batch Job Frequency Tuning:
Adjust your batch job schedule from the default hourly run to every 20-30 minutes. Navigate to Admin > System Settings > Job Scheduler and modify the ‘Audit Escalation Check’ job frequency. Set it to run every 30 minutes during business hours (6 AM - 8 PM) and hourly overnight. This ensures faster detection of overdue audits without overwhelming the system. Monitor for job overlap - if a 30-minute run takes longer than 30 minutes, you’ll create queue buildup. Set a max execution timeout of 25 minutes to prevent this.
Parallel Processing Enablement:
Enable parallel batch processing in 23R2 by configuring the batch job to use multiple worker threads. Go to Admin > Business Admin > Audit Management Settings and enable ‘Parallel Batch Processing’ with 4-6 worker threads (based on your server capacity). This allows simultaneous processing of audit records rather than sequential execution. Configure your workflow to support parallel execution by ensuring workflow steps don’t have dependencies that require sequential processing. Specifically, set the ‘Allow Parallel Execution’ flag on your audit status evaluation and notification steps.
Workflow Step Optimization:
Refactor your escalation workflow to eliminate redundant queries and optimize notification generation:
-
Entry Criteria Filtering: Add strict entry criteria to process only audits that genuinely need escalation. Use a calculated field to pre-filter audits based on status and due date before workflow entry. This can reduce your processing set from 1,200 to 150-200 audits that actually require action.
-
Data Pre-fetching: At workflow entry, retrieve all required audit details (auditor, manager, site, compliance owner) in a single query and store as workflow variables. Pass these forward through subsequent steps instead of re-querying at each step.
-
Notification Bundling: Implement a 15-minute collection window where escalated audit IDs are gathered, then send consolidated notifications rather than individual ones. Create a custom notification template that accepts multiple audit IDs and formats them as a list. This reduces notification processing from 1,200 individual sends to 8-10 bundled sends.
-
Asynchronous Notification Dispatch: Configure notification actions to run asynchronously so they don’t block the main workflow execution. Enable ‘Async Notification Processing’ in your workflow step configuration.
-
Remove Unnecessary Steps: Audit your workflow for steps that don’t add value. Common culprits include redundant approval steps, excessive logging actions, or status updates that could be combined.
Validation and Monitoring:
After implementing these changes, monitor batch job execution times in Admin > Operations > Job History. Target execution time should drop from 45-60 minutes to 12-18 minutes. Set up alerts if execution time exceeds 20 minutes so you can investigate before SLA impacts occur. Track your escalation SLA compliance rate - you should see overdue escalations drop from 3-4 hour delays to under 45 minutes consistently.
Implement these changes in a staged approach: start with frequency tuning (quick win), then enable parallel processing (moderate complexity), and finally optimize workflow steps (requires testing). This phased approach minimizes risk while delivering incremental improvements.
We had similar issues last year. First thing to check is your batch job schedule configuration. Default frequency might be too conservative for your audit volume. You can adjust the job frequency in Admin > System Settings > Job Scheduler, but be careful not to overlap runs. We moved from hourly to every 30 minutes and saw immediate improvement in escalation timing.
Don’t overlook workflow step optimization itself. Review each step in your escalation workflow and eliminate unnecessary queries or object retrievals. We found that our workflow was re-querying audit details at every step instead of passing data forward. Simple refactoring cut our per-audit processing time from 3.2 seconds to 0.8 seconds. That adds up fast with 1,200 audits. Also check if you have any custom actions that aren’t optimized - those can be major performance drags.