Contact management API duplicate detection runs slowly when processing bulk imports

We’re importing 50K+ contacts daily via the REST API and duplicate detection is taking 4-6 hours. Our batch processing calls the /contacts endpoint with 500 records per request. The duplicate matching runs synchronously which blocks our import pipeline.

We’ve tried adjusting batch sizes (200-1000 records) but performance doesn’t improve much. The API response shows it’s running fuzzy matching on name/email/phone for every contact. Is there a way to optimize duplicate detection or use async patterns?

Current approach:


POST /crmRestApi/resources/11.13.18.05/contacts
Payload: [{"FirstName":"John","LastName":"Smith","EmailAddress":"john@example.com"},...]
Header: DuplicateDetection: enabled

Our indexes on Contact objects look standard. Any recommendations for batch processing optimization or tuning duplicate detection rules?

For async processing, use the bulk import API instead of the standard REST endpoint. Submit your batch to /crmRestApi/resources/11.13.18.05/contacts/bulk and you’ll get a job ID back immediately. Then poll /jobs/{jobId} for status. This runs duplicate detection in the background. Also, make sure your Contact object has indexes on EmailAddress, LastName, and Phone fields - these are critical for duplicate matching performance.

We had similar issues last year. The synchronous duplicate detection is the bottleneck. Have you looked at the matching rules configuration? We reduced our rule complexity and saw 40% improvement. Also check if you really need fuzzy matching on all three fields - exact matching is much faster.