What are the pros and cons of using External ID fields vs Salesforce ID for data import mapping

charles_coder · August 19, 2025, 1:27pm

I’m designing our data integration strategy for ongoing imports from three legacy systems (ERP, old CRM, billing system). We need to maintain relationships between objects during import and handle both new records and updates to existing records.

Two main approaches I’m considering:

External ID Fields: Create custom External ID fields on each object (Account External Key, Contact External Key, etc.) that store the legacy system IDs. Use these for upsert operations and relationship mapping.
Salesforce ID Mapping: Maintain external mapping tables that store Legacy ID → Salesforce ID pairs, then use Salesforce IDs directly for all imports and relationships.

I’ve seen teams strongly advocate for both approaches. External IDs seem cleaner for upsert support and eliminate the need for mapping tables, but I’m concerned about data model bloat and whether they maintain referential integrity as well as native Salesforce IDs.

For those who’ve implemented large-scale integrations with multiple legacy systems: which approach proved more maintainable long-term? What are the real-world trade-offs around performance, data integrity, and mapping complexity when dealing with 100K+ records across 10+ objects?

anthony_cloud · August 23, 2025, 9:44am

Having architected integrations for Fortune 500 companies with exactly this scenario (3+ legacy systems, 100K+ records, 10+ objects), I can provide perspective on all three focus areas.

External ID Upsert Support:

External IDs are purpose-built for your use case. The upsert operation is atomic and handles both inserts and updates in a single API call:

Advantages:

No pre-query needed to determine if record exists (saves API calls)
Automatic relationship resolution during import using colon syntax (Parent:External_ID__c)
Built-in duplicate prevention when marked as unique
Index automatically created by Salesforce for fast lookups
Supports bulk operations with 10,000 records per batch

Disadvantages:

25 External ID limit per object (rarely a real constraint)
Fields visible in data model (mitigate with field-level security)
Can’t map one legacy ID to multiple Salesforce records

Mapping Table Approach:

Requires separate query to get Salesforce ID before import
Double the API calls (query mapping + insert/update)
Additional infrastructure to maintain
Mapping table can become bottleneck at scale

Verdict: For 100K+ records with ongoing sync, External ID upsert saves 50% of API calls and reduces complexity significantly.

Salesforce ID Referential Integrity:

This is where the approaches diverge meaningfully:

External IDs:

Referential integrity maintained WITHIN Salesforce through lookup relationships
If parent record deleted, child lookup becomes null (standard Salesforce cascade rules apply)
Relationship resolution happens during import: no orphaned references if External ID lookup fails (record rejects with clear error)
Self-documenting: External ID field shows the source system reference

Mapping Tables:

Referential integrity depends on EXTERNAL system maintaining mapping accuracy
If mapping table and Salesforce get out of sync, you import orphaned references
No built-in cascade delete handling
Requires custom validation logic to prevent referential integrity violations

Real-world example: We had a mapping table approach fail when the mapping DB was restored from backup, reverting 2 weeks of mappings. Subsequent imports created duplicate records because the integration thought records didn’t exist yet. With External IDs, this can’t happen - the External ID IS the record identifier in both systems.

Verdict: External IDs provide stronger referential integrity because the relationship is native to Salesforce’s data model.

Legacy System Mapping:

For integrating three legacy systems (ERP, CRM, billing), consider these patterns:

Option 1: Separate External ID per System (Recommended)


Account.ERP_Customer_ID__c (External ID, Unique)
Account.Legacy_CRM_ID__c (External ID, Unique)
Account.Billing_System_ID__c (External ID, Unique)

Advantages:

Clear source system attribution
Supports records that exist in multiple legacy systems
Easy troubleshooting: “Which system created this record?”
Enables system-specific upsert logic

Option 2: Single External ID with Prefix


Account.Legacy_System_ID__c = "ERP:CUST-12345" or "CRM:00012345"

Advantages:

Fewer fields (only 1 External ID per object)
Works if records are unique to one source system

Disadvantages:

Harder to query by source system
Requires string parsing in integration code

For your 100K+ record scale:

Use Option 1 (separate External IDs per system)
Set field-level security to hide from end users
Create custom report types that include External IDs for admin troubleshooting
Document which system is the “master” for each object

Performance Considerations at Scale: We tested both approaches with 500K Account records:

External ID upsert: 8,000 records/minute average
Mapping table approach: 4,200 records/minute (query overhead)
External ID relationship resolution: <100ms per lookup
Mapping table join queries: 200-300ms per lookup

At 100K records across 10 objects (1M total records), External IDs will save approximately 40% on total sync time.

Long-term Maintainability:

After 5 years with External ID approach:

Zero data integrity issues from mapping misalignment
Decommissioned one legacy system: just removed field-level security, kept field for audit history
Added two new systems: just added new External ID fields
Total maintenance overhead: ~2 hours/year updating field-level security

After 5 years with mapping table approach (previous client):

Three data corruption incidents requiring reconciliation
Mapping database became single point of failure
Adding new systems required schema changes, ETL updates, backup configuration
Total maintenance overhead: ~40 hours/year

Recommended Architecture:

Create External ID fields for each legacy system on each object
Use upsert operations for all imports (Data Loader or API)
Map relationships using colon syntax: `Parent:ERP_Customer_ID__c
Implement field-level security to hide External IDs from standard users
Create custom report types including External IDs for integration troubleshooting
Document source system ownership in object descriptions
Set up monitoring on failed upserts (External ID not found = data quality issue in source)

This approach maximizes upsert efficiency, maintains strong referential integrity through Salesforce’s native relationship model, and provides clear legacy system mapping that’s self-documenting and maintainable long-term. The slight data model “bloat” is far outweighed by the operational benefits at your scale.

stevencloud · August 21, 2025, 2:29pm

From a governance perspective, External IDs create audit trail issues. When troubleshooting data problems, you need to trace back to the source system. With External IDs, that’s built in. With mapping tables, you’re maintaining additional infrastructure that needs backup, versioning, and access controls. We had a case where our mapping database was corrupted and we lost weeks of integration history.

carol_guru · August 20, 2025, 4:52pm

One downside of External IDs that isn’t mentioned enough: they’re visible to users unless you hide them with field-level security or page layouts. If you’re storing ugly legacy system IDs like ‘ERP_CUST_00012345_V2’, users will see that clutter. Also, you’re limited to 25 External ID fields per object. For most integrations that’s plenty, but if you’re integrating with many systems, you could hit that limit.

Topic		Replies	Views
Best practices for Account API integration when syncing external CRM data: field mapping, deduplication, and error handling Salesforce discussion , account-mgmt , data-quality , api-development , rest-api , sf-spring-25 , field-mapping , external-id , account-api-integration	4	1	April 11, 2025
Account sync to external ERP fails with Salesforce Bulk API upsert errors Salesforce question , account-mgmt , rest-api , csv , sf-winter-24 , bulk-api , data-inconsistency , integration-frameworks , missing-external-id	3	0	May 24, 2025
Integration module identity mapping fails with external systems in aziot-25, blocking data flow Microsoft Azure IoT question , integration , api-integration , rest-api , data-flow , external-integration , aziot-25 , security-identi , identity-mapping	7	2	August 31, 2025
Integration Hub vs External Services: Choosing the right approach for API connectivity in Salesforce Salesforce discussion , rest-api , integration-hub , sf-summer-25 , integration-architecture , integration-frameworks , api-connectivity , external-services	4	4	July 11, 2025
Contact API upsert fails on duplicate email when using external ID for matching Salesforce question , api-development , rest-api , data-sync , contact-mgmt , sf-spring-24 , external-id , duplicate-rules , upsert-operations	6	2	July 11, 2025
External system sync fails for custom object mapping in Integration Hub with field mismatch error Salesforce question , configuration , rest-api , data-sync , integration-hub , sf-summer-24 , json , field-mapping , custom-objects	5	2	June 22, 2025
Sales forecasting integration fails to sync external forecast data via REST API Salesforce question , rest-api , sf-summer-24 , json , sales-forecast , data-import-migration , integration-frameworks , missing-ext-id , forecast-sync-fails	4	1	March 18, 2025
Partner account import fails with account relationship mapping errors using Data Loader Salesforce question , partner-portal , sf-spring-24 , data-loader , data-import-migration , external-id , account-relationships , reference-data , relationship-mapping	7	5	December 12, 2024
Real estate property data import fails due to duplicate property IDs in legacy system Oracle Fusion Cloud question , data-migration , data-quality , etl , real-estate-mgmt , ofc-23c , deduplication , fbdi , id-mapping	7	3	July 14, 2025

What are the pros and cons of using External ID fields vs Salesforce ID for data import mapping

Related topics