RDS Data API batch insert fails with duplicate key error during ETL job execution

emma_coder · January 30, 2025, 4:02pm

We’re experiencing intermittent duplicate key errors when performing batch inserts using the RDS Data API with Aurora PostgreSQL. Our ETL pipeline processes customer orders and the failures are unpredictable - sometimes the same batch succeeds on retry, sometimes it fails again.

Here’s our batch insert code:

INSERT INTO orders (order_id, customer_id, amount, status)
VALUES (:order_id, :customer_id, :amount, 'pending')

We’re using executeStatement with parameterSets for batching. The order_id is a UUID generated by our application. The error message shows: ERROR: duplicate key value violates unique constraint "orders_pkey". We have proper unique constraints on order_id in Aurora PostgreSQL, but we’re not sure why duplicates are being attempted. Our ETL job doesn’t have explicit deduplication logic - we assumed the database constraint would be sufficient. Is there a better way to handle batch insert error handling with the Data API?

ruth_tech · February 2, 2025, 12:03pm

The RDS Data API’s batch execution is atomic per statement but not across retries. If your ETL job retries a failed batch without tracking which records were already inserted, you’ll get duplicates. Are you using transaction IDs to maintain state? The Data API supports transactions via beginTransaction and commitTransaction. You should wrap your batch in a transaction and implement proper rollback logic on failure.

justin_pro · March 2, 2025, 6:17pm

The intermittent nature of your failures suggests a retry logic issue rather than a data problem. When using the Data API with batch operations, you must track the state of each batch. Consider implementing a batch_id column in your orders table to track which batches have been processed. This allows you to skip already-processed batches on retry rather than attempting duplicate inserts.

jacob_arch · February 27, 2025, 1:29am

Another consideration - are you running multiple ETL job instances in parallel? If so, you might have race conditions where two instances try to insert the same order_id simultaneously. Even with database constraints, the timing of the Data API calls could cause intermittent failures. You need either distributed locking or a message queue to ensure only one instance processes each order.

raymondpro · February 4, 2025, 5:56pm

We’re not currently using transactions with the Data API. We thought batch execution would handle atomicity automatically. So you’re saying we need to explicitly start a transaction, execute the batch, and then commit? What happens if the transaction times out - does the Data API automatically rollback?

gregoryninja · February 15, 2025, 11:14am

Yes, explicit transaction management is critical here. The Data API has a transaction timeout of 24 hours by default, but you should commit much sooner. If a transaction times out or your connection drops, it will automatically rollback. However, your bigger issue might be UUID generation. If you’re generating UUIDs in your application and retrying batches, you need to ensure idempotency. Consider using INSERT … ON CONFLICT DO NOTHING or DO UPDATE to handle duplicates gracefully instead of failing.

Topic		Replies	Views
Accounts Receivable REST API batch invoice posting fails with 500 error and partial data commit in multi-company setup Infor CloudSuite question , api-development , rest-api , batch-processing , acct-receivable , ics-2021 , json , data-inconsistency , infor-os-api-gateway	6	1	December 4, 2025
Vendor invoice batch import via API fails with duplicate invoice error Microsoft Dynamics 365 question , api-development , rest-api , batch-processing , acct-payable , duplicate-handling , idempotency , error-409 , d365-10-0-39	7	1	December 7, 2025
Supplier sync integration fails with duplicate vendor IDs when batch importing MasterControl Quality Excellence question , supplier-mgmt , rest-api , json , import-failure , duplicate-key , integration-frameworks , mc-2022-2 , api-batch-import	5	0	November 4, 2025
Sales order REST API integration returns duplicate order errors in batch processing Oracle Fusion Cloud question , integration , rest-api , batch-processing , sales-mgmt , ofc-22d , json , idempotency , order-management	4	0	May 30, 2025
Loyalty program API batch insert fails with DUPLICATE_CONTACT error on upsert Salesforce question , api-development , rest-api , batch-processing , sf-summer-24 , json , duplicate-error , loyalty-programs , member-onboarding	3	0	March 16, 2025
Supplier validation batch import fails with duplicate key constraint violations Veeva Vault QMS question , supplier-mgmt , xml , vvq-23r2 , data-validation , duplicate-detection , data-migration-import , batch-import-engine , upsert-mode	6	0	March 19, 2025
RDS Data API returns Internal Server Error when executing complex SQL queries with joins Amazon Web Services (AWS) question , database , sql , rest-api , query-optimization , aws-2019 , reporting-blocked , rds-data-api , internal-error	3	0	May 29, 2025
Warehouse management API batch update for item locations returns unique constraint violation Blue Yonder Luminate question , api-development , sql , rest-api , warehouse-mgmt , inventory-sync , by-2023-1 , custom-integration , db-constraint	7	0	May 8, 2025
Price list synchronization API fails with duplicate record errors in pricing management Microsoft Dynamics 365 question , api-development , rest-api , data-sync , pricing-mgmt , json , http-error , duplicate-records , d365-10-0-41	7	0	February 6, 2025

RDS Data API batch insert fails with duplicate key error during ETL job execution

Related topics