We have an automation script that creates non-conformance records based on incoming quality events from our manufacturing execution system. The script works fine for individual events, but when multiple non-conformance events are logged in rapid succession during batch imports, we’re getting duplicate records created.
Here’s a simplified version of our current logic:
for (QualityEvent event : eventList) {
NonConformance nc = createNonConformance(event);
nc.setEventId(event.getId());
save(nc);
}
The event identifier usage should prevent duplicates, but somehow the same event is generating multiple non-conformance records. I suspect there’s a timing issue with the deduplication logic where the script checks for existing records before the previous save operation completes. This is causing major reporting confusion as our metrics are inflated with duplicate entries.
Has anyone dealt with similar concurrency issues in VVQ 23R2 automation scripts?
Here’s a comprehensive solution that addresses all three focus areas:
Deduplication Logic: Implement a robust deduplication strategy using a combination of database-level unique constraints and application-level checks. First, ensure your non-conformance object has a unique index on the event_id field at the database level. This provides a hard guarantee against duplicates regardless of timing issues.
Event Identifier Usage: Enhance your event identifier strategy to include both the source event ID and a timestamp hash. This creates a compound key that’s truly unique:
String uniqueKey = event.getId() + "_" +
generateHash(event.getTimestamp());
if (!recordExists(uniqueKey)) {
NonConformance nc = createNonConformance(event);
nc.setUniqueIdentifier(uniqueKey);
nc.setEventId(event.getId());
}
Automation Script Update: Refactor your script to use a try-catch pattern that gracefully handles duplicate key violations:
for (QualityEvent event : eventList) {
try {
String uniqueKey = buildUniqueKey(event);
if (isNewEvent(uniqueKey)) {
NonConformance nc = createNC(event, uniqueKey);
save(nc);
}
} catch (DuplicateKeyException e) {
logDuplicateAttempt(event.getId());
continue;
}
}
Additionally, implement a post-processing cleanup job that runs after batch imports to identify and merge any duplicates that slipped through. This job should compare non-conformance records created within the same time window (e.g., last 5 minutes) and consolidate any that share the same event ID, keeping the first created record and archiving the duplicates.
For immediate remediation, run a data cleanup script to identify and remove existing duplicates based on event ID, keeping only the earliest created record for each unique event. Then deploy the updated automation script with proper deduplication logic to prevent future occurrences.
I tried adding a query check, but the duplicates are still appearing occasionally. I think the problem is that the query executes before the previous save commits to the database. Is there a way to force synchronous processing or implement a proper locking mechanism in Veeva Vault automation scripts?
We had the exact same problem. The solution was to add a unique constraint check before the save operation. Query the system for any non-conformance records with the same event ID, and only proceed with creation if the query returns zero results. Also consider adding a small delay between batch operations to reduce the likelihood of concurrent execution.
You need to implement idempotency in your script. Instead of just checking if a record exists, use a two-phase approach: first, try to create a placeholder record with just the event ID in a locked state, then update it with full details. If the placeholder creation fails due to a duplicate key, you know another process is already handling that event. This pattern works well for high-volume batch processing scenarios where concurrent execution is unavoidable.
Another approach is to modify your batch import process to deduplicate events before they reach the automation script. Implement a staging area where incoming events are collected and deduplicated based on event ID, then process only unique events through your automation. This shifts the responsibility upstream and ensures your automation script receives clean, unique data to begin with.