Duplicate records after ETL merge in inventory warehouse causing incorrect stock levels

tomadmin · October 20, 2025, 11:26am

We’re experiencing a critical issue with our inventory reporting system. After running ETL merge operations in our data warehouse, we’re seeing duplicate records for the same SKUs, which is causing our SSRS inventory reports to show inflated stock levels.

The ETL process uses a MERGE statement to upsert inventory data from multiple distribution centers. The merge logic is supposed to update existing records and insert new ones based on SKU and location. However, we’re ending up with 2-3 duplicate rows for some products.


MERGE INTO InventoryFact AS target
USING StagingInventory AS source
ON target.SKU = source.SKU AND target.LocationID = source.LocationID
WHEN MATCHED THEN UPDATE SET Quantity = source.Quantity
WHEN NOT MATCHED THEN INSERT (SKU, LocationID, Quantity) VALUES (source.SKU, source.LocationID, source.Quantity);

Our inventory data validation checks are failing, and warehouse managers are making decisions based on incorrect stock numbers. Has anyone dealt with deduplication in SQL after ETL merges?

miguel_lead · November 1, 2025, 2:18pm

ROW_NUMBER() works but can be slow on large datasets. Consider using a CTE with ROW_NUMBER() partitioned by SKU and LocationID, ordered by timestamp descending to keep the most recent record. Alternatively, create a cleaned staging table first using GROUP BY with MAX(timestamp) to identify the latest record for each SKU/Location combination. This is usually faster than window functions on big inventory tables.

thomas_arch · October 26, 2025, 10:12am

Your MERGE statement looks correct at first glance, but I suspect the issue is in your source data. If StagingInventory has duplicates before the merge, you’ll get multiple inserts. Run a quick check to see if your staging table has duplicate SKU/LocationID combinations before the merge executes.

arun_lead · November 3, 2025, 3:31am

Here’s a comprehensive solution addressing all three aspects of your duplicate record problem:

ETL Merge Logic Correction: Your MERGE statement needs defensive programming to handle source duplicates. First, add a unique constraint on your target table to prevent duplicates at the database level:

ALTER TABLE InventoryFact
ADD CONSTRAINT UK_Inventory UNIQUE (SKU, LocationID);

Then modify your merge to use a deduplicated source:

WITH CleanSource AS (
  SELECT SKU, LocationID, Quantity, UpdateTimestamp,
    ROW_NUMBER() OVER (PARTITION BY SKU, LocationID
                       ORDER BY UpdateTimestamp DESC) AS rn
  FROM StagingInventory
)
MERGE INTO InventoryFact AS target
USING (SELECT SKU, LocationID, Quantity
       FROM CleanSource WHERE rn = 1) AS source
ON target.SKU = source.SKU AND target.LocationID = source.LocationID
WHEN MATCHED THEN
  UPDATE SET Quantity = source.Quantity, LastUpdated = GETDATE()
WHEN NOT MATCHED THEN
  INSERT (SKU, LocationID, Quantity, LastUpdated)
  VALUES (source.SKU, source.LocationID, source.Quantity, GETDATE());

Deduplication with SQL: For existing duplicates in InventoryFact, implement a cleanup procedure that runs after your ETL:

-- Identify and log duplicates for investigation
INSERT INTO ETL_DuplicateLog (SKU, LocationID, DuplicateCount, DetectedDate)
SELECT SKU, LocationID, COUNT(*) as DupCount, GETDATE()
FROM InventoryFact
GROUP BY SKU, LocationID
HAVING COUNT(*) > 1;

-- Remove duplicates keeping most recent
WITH DuplicateRecords AS (
  SELECT InventoryID,
    ROW_NUMBER() OVER (PARTITION BY SKU, LocationID
                       ORDER BY LastUpdated DESC, InventoryID DESC) AS rn
  FROM InventoryFact
)
DELETE FROM InventoryFact
WHERE InventoryID IN (
  SELECT InventoryID FROM DuplicateRecords WHERE rn > 1
);

Inventory Data Validation: Implement a three-tier validation framework:

Pre-Merge Validation - Check staging data quality before processing:

-- Detect source duplicates
SELECT SKU, LocationID, COUNT(*) as RecordCount
FROM StagingInventory
GROUP BY SKU, LocationID
HAVING COUNT(*) > 1;

-- Validate quantity ranges
SELECT * FROM StagingInventory
WHERE Quantity < 0 OR Quantity > 999999;

Post-Merge Validation - Verify data integrity after ETL:

-- Check for duplicates in target
SELECT 'Duplicates Found' as ValidationStatus, COUNT(*) as IssueCount
FROM (SELECT SKU, LocationID FROM InventoryFact
      GROUP BY SKU, LocationID HAVING COUNT(*) > 1) d

UNION ALL

-- Verify record counts match expected
SELECT 'Record Count Mismatch' as ValidationStatus,
       ABS((SELECT COUNT(DISTINCT SKU, LocationID) FROM StagingInventory) -
           (SELECT COUNT(*) FROM InventoryFact)) as IssueCount;

SSRS Report Validation - Add data quality indicators to your inventory reports:

Create a validation dataset in SSRS that checks for anomalies:

SELECT
  CASE WHEN DuplicateCount > 0 THEN 'WARNING: Duplicates detected'
       WHEN NegativeQty > 0 THEN 'ERROR: Negative quantities found'
       ELSE 'OK' END as DataQualityStatus,
  (SELECT COUNT(*) FROM InventoryFact
   GROUP BY SKU, LocationID HAVING COUNT(*) > 1) as DuplicateCount,
  (SELECT COUNT(*) FROM InventoryFact WHERE Quantity < 0) as NegativeQty

Display this as a warning banner at the top of your inventory reports so warehouse managers know when data quality issues exist.

Root Cause Prevention: Implement these safeguards in your ETL process:

Add transaction logging to track which source system contributed each record
Implement a reconciliation process that compares source system totals against warehouse totals
Create alerts when duplicate counts exceed threshold (e.g., more than 5 duplicates per ETL run)
Schedule the deduplication cleanup procedure to run immediately after ETL completes but before reports are generated

This multi-layered approach prevents duplicates at the source, catches them during ETL, cleans up any that slip through, and alerts users when data quality issues exist.

edwa4838 · October 29, 2025, 8:15am

We had this exact problem last year. The issue was that our ETL was running concurrently from different distribution centers, and the MERGE wasn’t properly handling race conditions. Two processes would both see ‘NOT MATCHED’ and insert duplicate rows. We fixed it by adding a unique constraint on SKU+LocationID and wrapping the MERGE in a transaction with proper isolation level. Also check if you have triggers on the table that might be causing unexpected inserts.

juli1770 · November 2, 2025, 7:24pm

Before deduplicating, you need to understand why duplicates exist. Are they legitimate updates from the same location within the ETL window, or are they data quality issues from source systems? Sometimes duplicates indicate that different systems are tracking the same inventory (like a WMS and ERP both reporting the same SKU). You might need business logic to determine which source is authoritative rather than just picking the latest timestamp.

Topic		Views
Duplicate records after ETL merge in inventory warehouse causing inaccurate reports SQL Server Reporting Services (SSRS) question , data-quality , sql , ssrs-2017 , data-preparation , etl-integration , duplicate-records , inventory-reports , merge-logic	4	May 22, 2025
Duplicate records appear in data model after merging multiple CSV sources SAP Crystal Reports question , data-quality , data-modeling , csv , scr-2016 , data-preparation , duplicate-records , inaccurate-report , merge-logic	5	August 6, 2025
Duplicate records appear in data model after merging multiple data sources SAP Crystal Reports question , data-quality , data-modeling , sql , scr-2016 , data-preparation , duplicate-records , inaccurate-report , crystal-reports	4	November 19, 2025
Bulk transfer fails during legacy import due to duplicate SKU codes in distribution management migration Blue Yonder Luminate question , data-migration , etl , sql , duplicate-key , constraint-violation , staging-validation , distribution-mgmt , by-2022-2	4	April 17, 2025
Data entity import fails for stock counts due to duplicate records blocking warehouse operations Microsoft Dynamics 365 question , data-migration , stock-control , warehouse-operations , import-blocked , duplicate-key , d365-10-0-41 , data-management-framework , dmf	5	February 28, 2025
Quality control data import fails with duplicate records warning after running Data Mover scripts for inspection lots Microsoft Dynamics 365 question , database-mgt , sql , sql-server , quality-control , duplicate-key , data-import , d365-10-0-41 , data-cleaning	7	May 8, 2025
Duplicate change orders created during ECN data migration, causing workflow confusion and approval delays Oracle Agile PLM question , data-migration , workflow , sql , ecn-mgmt , data-import , duplicate-records , agil-9-3-6 , unique-identifier	5	July 8, 2025
Process automation batch import fails due to duplicate records ServiceNow question , process-automation , data-integration , snow-utah , javascript , duplicate-records , deduplication , import-set , asset-management	3	August 15, 2025
Warehouse transfer import fails when target period is closed in data migration scenario Epicor SCM question , data-migration , sql , warehouse-mgmt , data-import , validation-error , es-10-2-500 , duplicate-bin , inventory-transfer	5	April 27, 2025

Duplicate records after ETL merge in inventory warehouse causing incorrect stock levels

Related topics