We’re experiencing intermittent duplicate key errors when performing batch inserts using the RDS Data API with Aurora PostgreSQL. Our ETL pipeline processes customer orders and the failures are unpredictable - sometimes the same batch succeeds on retry, sometimes it fails again.
Here’s our batch insert code:
INSERT INTO orders (order_id, customer_id, amount, status)
VALUES (:order_id, :customer_id, :amount, 'pending')
We’re using executeStatement with parameterSets for batching. The order_id is a UUID generated by our application. The error message shows: ERROR: duplicate key value violates unique constraint "orders_pkey". We have proper unique constraints on order_id in Aurora PostgreSQL, but we’re not sure why duplicates are being attempted. Our ETL job doesn’t have explicit deduplication logic - we assumed the database constraint would be sufficient. Is there a better way to handle batch insert error handling with the Data API?
The RDS Data API’s batch execution is atomic per statement but not across retries. If your ETL job retries a failed batch without tracking which records were already inserted, you’ll get duplicates. Are you using transaction IDs to maintain state? The Data API supports transactions via beginTransaction and commitTransaction. You should wrap your batch in a transaction and implement proper rollback logic on failure.
The intermittent nature of your failures suggests a retry logic issue rather than a data problem. When using the Data API with batch operations, you must track the state of each batch. Consider implementing a batch_id column in your orders table to track which batches have been processed. This allows you to skip already-processed batches on retry rather than attempting duplicate inserts.
Another consideration - are you running multiple ETL job instances in parallel? If so, you might have race conditions where two instances try to insert the same order_id simultaneously. Even with database constraints, the timing of the Data API calls could cause intermittent failures. You need either distributed locking or a message queue to ensure only one instance processes each order.
We’re not currently using transactions with the Data API. We thought batch execution would handle atomicity automatically. So you’re saying we need to explicitly start a transaction, execute the batch, and then commit? What happens if the transaction times out - does the Data API automatically rollback?
Yes, explicit transaction management is critical here. The Data API has a transaction timeout of 24 hours by default, but you should commit much sooner. If a transaction times out or your connection drops, it will automatically rollback. However, your bigger issue might be UUID generation. If you’re generating UUIDs in your application and retrying batches, you need to ensure idempotency. Consider using INSERT … ON CONFLICT DO NOTHING or DO UPDATE to handle duplicates gracefully instead of failing.