Your duplicate SSN errors during migration stem from three overlapping data quality issues that require systematic resolution:
1. Legacy Data Migration Complexity (addressing your historical data challenges):
The core issue is that Workday maintains SSN uniqueness across ALL worker types and statuses - active employees, terminated employees, contingent workers, and even applicant records in the recruiting module. Your deduplication scripts only validated the migration dataset, not the existing Workday tenant data. During migration, you must deduplicate against the target system’s complete SSN inventory.
To build a comprehensive SSN inventory from Workday:
- Get_Workers (Include_Terminated=true, Include_Inactive=true)
- Get_Contingent_Workers (all statuses)
- Get_Job_Applications (to catch applicant SSNs)
- Get_Terminated_Workers (specifically for long-term terminated)
The 200 terminated worker matches you found explain part of your 500 failures. The remaining 300 are likely split between contingent workers (~100-150), applicants with SSNs on file (~50-75), and data quality issues in your source data (~75-100).
2. Deduplication Script Limitations (addressing your validation gaps):
Your scripts validated SSN uniqueness within the migration dataset but missed several critical scenarios:
Formatting Normalization: Workday stores SSNs as 9-digit strings internally (no dashes), but accepts XXX-XX-XXXX format via API. Your deduplication must normalize both formats:
# Normalize SSN for comparison
ssn_normalized = ssn.replace('-', '').replace(' ', '')
if len(ssn_normalized) == 9 and ssn_normalized.isdigit():
# Valid SSN format
Placeholder Detection: Legacy systems often use fake SSNs for employees without valid numbers. Flag and exclude these patterns:
- 000-00-0000, 111-11-1111, 123-45-6789 (sequential test values)
- 999-99-9999 (common placeholder)
- SSNs where all digits are identical: 111111111, 222222222
Leading Zero Preservation: Excel and CSV exports strip leading zeros, converting 001-23-4567 to 1-23-4567. Your deduplication script must pad SSNs to 9 digits:
ssn_padded = ssn_normalized.zfill(9)
3. SSN Field Formatting Standards (addressing your API payload structure):
The API accepts SSNs in XXX-XX-XXXX format, but inconsistent formatting in your payloads can cause validation failures that appear as duplicate errors. Ensure your update requests format SSNs consistently:
{
"Worker_Reference": {"ID": "employee_id"},
"Personal_Information": {
"SSN": "123-45-6789"
}
}
Complete Resolution Strategy:
Phase 1 - Inventory Existing SSNs:
- Extract all SSNs from Workday using the four API calls mentioned above
- Normalize all extracted SSNs (remove dashes, pad zeros, convert to 9-digit strings)
- Create a master exclusion list of SSNs already in Workday
Phase 2 - Enhanced Deduplication:
- Normalize all SSNs in your migration dataset using the same logic
- Flag placeholder SSNs (000-00-0000 patterns) for manual review
- Cross-reference migration SSNs against the Workday exclusion list
- For matches found: determine if it’s a rehire (same person) or true duplicate (different people, same SSN - data quality issue)
Phase 3 - Handle Legitimate Duplicates:
For the ~200 cases where you’re migrating an active employee whose SSN exists on a terminated record:
- If same person (rehire): Use the existing Worker_ID instead of creating new record
- If different person (data error): Correct the SSN in source system before migration
- If SSN must be preserved on terminated record (legal/compliance): Work with Workday support to temporarily clear terminated record’s SSN, migrate active employee, then restore if needed
Phase 4 - Resolve Unexplained Failures:
For your remaining 300 failures without obvious Workday matches:
- Export these 300 SSNs and manually validate in source system
- Check for transposed digits (123-45-6789 vs 123-54-6789)
- Verify against contingent worker and applicant repositories
- Review Workday tenant’s custom validation rules (Security → Domain → National ID validations)
API Query to Find All SSN Instances:
GET /workers?Include_Terminated=true
GET /contingentWorkers
GET /jobApplications?status=all
Filter response for National_ID where Type='SSN'
The combination of incomplete worker type coverage, formatting inconsistencies, and placeholder SSNs in legacy data explains your 500 failures. Implementing the enhanced deduplication logic with comprehensive Workday inventory checking should reduce failures to less than 1% (representing genuine data quality issues requiring manual correction).