Device registry bulk import creates duplicate entries when handling concurrent operations

Our device onboarding process uses the Bulk Import API to register thousands of devices daily, but we’re encountering duplicate device entries when multiple import operations run concurrently. This creates data inconsistency across our device registry.

Bulk import request structure:


POST /iot/api/v2/devices/bulk
{"devices": [{"deviceId": "DEV-001", "type": "sensor"}],
 "mode": "upsert"}

Despite using upsert mode, we’re seeing duplicate device records with identical deviceId values but different internal IDs. The Bulk Import API documentation states it should handle duplicate detection, but concurrent operations seem to bypass this logic. We’ve verified device ID validation is enabled and pre-import reconciliation processes are in place. The duplicates appear when 2-3 import jobs overlap timing-wise. How can we properly configure bulk import upsert to handle concurrent operations without creating duplicates?

Rather than changing isolation levels which impacts performance, implement idempotent bulk import with explicit locking. The Bulk Import API supports a lock acquisition parameter that prevents concurrent operations on the same device set. Use the deviceIdPrefix parameter to partition your imports so different jobs work on non-overlapping device ID ranges.

The duplicate device issue stems from insufficient concurrency controls in your bulk import configuration. Here’s the complete solution:

Bulk Import Upsert Configuration: The API’s upsert mode needs explicit concurrency handling:


POST /iot/api/v2/devices/bulk
{
  "devices": [...],
  "mode": "upsert",
  "concurrencyControl": "optimistic",
  "conflictResolution": "merge",
  "validateUnique": true
}

Key parameters:

  • concurrencyControl: optimistic enables version-based conflict detection
  • conflictResolution: merge handles concurrent updates intelligently
  • validateUnique: true enforces pre-insert duplicate checking with proper locking

Duplicate Detection Logic: Implement a two-phase import process:

Phase 1 - Pre-validation:


POST /iot/api/v2/devices/validate
{
  "deviceIds": ["DEV-001", "DEV-002", ...],
  "checkExisting": true
}

This returns which devices already exist, allowing you to separate inserts from updates.

Phase 2 - Targeted import:

Split your bulk operation into separate insert and update batches based on validation results. This eliminates the race condition by avoiding upsert logic entirely for known devices.

Device ID Validation: Enable strict validation mode in the bulk import configuration:


bulkImport.validation.strict=true
bulkImport.validation.duplicateCheck=PESSIMISTIC
bulkImport.concurrency.lockTimeout=30000

The PESSIMISTIC duplicate check acquires row-level locks before checking existence, preventing the race condition. The lock timeout ensures operations don’t hang indefinitely.

Pre-Import Reconciliation: Implement a reconciliation service that runs before bulk imports:

  1. Query existing devices matching the import payload deviceIds
  2. Classify each device as NEW, EXISTING_UNCHANGED, or EXISTING_MODIFIED
  3. Filter out EXISTING_UNCHANGED devices from the import
  4. Use INSERT for NEW devices and UPDATE for EXISTING_MODIFIED
  5. Submit separate bulk operations for inserts vs updates

This approach eliminates upsert ambiguity and prevents duplicates by making explicit insert/update decisions before API calls.

Concurrent Operation Handling: If you must support truly concurrent bulk imports, implement distributed locking:


bulkImport.distributedLock.enabled=true
bulkImport.distributedLock.provider=REDIS
bulkImport.distributedLock.keyPrefix=device_import

This ensures only one bulk import operation processes any given deviceId at a time, even across multiple application instances.

Implementation Priority:

  1. Enable strict validation and pessimistic duplicate checking (immediate fix)
  2. Implement pre-import reconciliation service (prevents 95% of duplicates)
  3. Configure distributed locking for remaining edge cases
  4. Add monitoring to detect and alert on any duplicate creation

After implementing these controls, your bulk import operations will handle concurrency correctly without creating duplicates. The key is moving duplicate detection from application logic to database-level constraints with proper locking, combined with pre-validation that eliminates ambiguous upsert scenarios.

The deviceIdPrefix partitioning sounds promising but our device IDs don’t follow predictable prefixes - they’re generated by external systems. Is there another way to prevent concurrent operations from conflicting? Maybe some kind of pre-import validation step?