Duplicate records after ETL merge in inventory warehouse causing inaccurate reports

Our inventory tracking system uses SSRS 2017 for warehouse reports, and we’re seeing duplicate records after running ETL merge operations. The issue started about two weeks ago after we modified our merge logic to handle updates from multiple source systems. Now our inventory count reports show the same SKU multiple times with different quantities, which is causing major issues with stock accuracy.

The ETL job runs every 4 hours and merges data from three sources: our WMS, the ERP system, and manual adjustments from warehouse staff. Here’s the simplified merge logic we’re using:

MERGE InventoryWarehouse AS target
USING InventoryStaging AS source
ON target.SKU = source.SKU
WHEN MATCHED THEN UPDATE SET Quantity = source.Quantity
WHEN NOT MATCHED THEN INSERT VALUES (source.SKU, source.Quantity);

The problem is that after the merge, we’re getting 2-3 records per SKU instead of one consolidated record. I suspect the issue is with how we’re identifying unique records - maybe SKU alone isn’t enough? We need proper deduplication with SQL and better inventory data validation before the merge happens. Has anyone dealt with similar ETL merge issues in inventory scenarios?

Nina’s right about the composite key issue. But there’s another problem - you’re running this merge every 4 hours from three sources. What happens when the WMS says quantity is 100, ERP says 95, and manual adjustments say 105? Your current logic will create three records. You need a master data management strategy here. Decide which source is authoritative for inventory quantities, or implement a reconciliation process that resolves conflicts before the merge. Also, add a LastModified timestamp to track which source has the most recent data.

I’ve worked through this exact scenario at a previous company. One thing to check immediately - look at your staging table before the merge runs. Use SELECT SKU, COUNT() FROM InventoryStaging GROUP BY SKU HAVING COUNT() > 1 to find duplicates. I bet you’ll find that the staging table itself has duplicates from your three sources. The merge is working correctly - it’s the input data that’s the problem. You need deduplication logic in the staging layer, not in the merge.

Here’s a comprehensive solution that addresses ETL merge logic, deduplication with SQL, and inventory data validation for your multi-source scenario.

Step 1: Implement Staging Validation and Deduplication

Before any merge, validate and deduplicate your staging data:

-- Create validated staging with source priority
WITH RankedInventory AS (
  SELECT
    SKU, Quantity, SourceSystem, LoadTimestamp,
    ROW_NUMBER() OVER (
      PARTITION BY SKU
      ORDER BY
        CASE SourceSystem
          WHEN 'MANUAL' THEN 1  -- Highest priority
          WHEN 'WMS' THEN 2
          WHEN 'ERP' THEN 3
        END,
        LoadTimestamp DESC
    ) AS SourceRank
  FROM InventoryStaging
  WHERE Quantity >= 0  -- Validation rule
)
SELECT SKU, Quantity, SourceSystem, LoadTimestamp
INTO #ValidatedStaging
FROM RankedInventory
WHERE SourceRank = 1;  -- Keep only highest priority per SKU

Step 2: Revised ETL Merge Logic

Update your merge to use the validated staging and track source lineage:

MERGE InventoryWarehouse AS target
USING #ValidatedStaging AS source
ON target.SKU = source.SKU
WHEN MATCHED AND source.LoadTimestamp > target.LastUpdated THEN
  UPDATE SET
    Quantity = source.Quantity,
    SourceSystem = source.SourceSystem,
    LastUpdated = source.LoadTimestamp,
    UpdateCount = target.UpdateCount + 1
WHEN NOT MATCHED THEN
  INSERT (SKU, Quantity, SourceSystem, LastUpdated, UpdateCount)
  VALUES (source.SKU, source.Quantity, source.SourceSystem,
          source.LoadTimestamp, 1)
WHEN NOT MATCHED BY SOURCE AND target.LastUpdated < DATEADD(day, -7, GETDATE()) THEN
  DELETE;  -- Remove stale records

Key improvements:

  • Only updates when source data is newer
  • Tracks which system provided the data
  • Counts updates for audit trail
  • Removes obsolete SKUs

Step 3: Inventory Data Validation Rules

Create a pre-merge validation procedure:

-- Pseudocode for validation checks:
1. Check for negative quantities (flag as error)
2. Identify SKUs with quantity changes > 50% (flag for review)
3. Find SKUs present in multiple sources with conflicting values
4. Validate SKU exists in master product catalog
5. Check for unrealistic quantities (e.g., > max_capacity)
6. Log all validation failures to ValidationLog table

Step 4: Conflict Resolution Reporting

Create a view that shows source conflicts before merge:

CREATE VIEW vw_InventoryConflicts AS
SELECT
  SKU,
  MAX(CASE WHEN SourceSystem='WMS' THEN Quantity END) AS WMS_Qty,
  MAX(CASE WHEN SourceSystem='ERP' THEN Quantity END) AS ERP_Qty,
  MAX(CASE WHEN SourceSystem='MANUAL' THEN Quantity END) AS Manual_Qty,
  COUNT(DISTINCT Quantity) AS ConflictCount
FROM InventoryStaging
GROUP BY SKU
HAVING COUNT(DISTINCT SourceSystem) > 1
  AND COUNT(DISTINCT Quantity) > 1;

This shows where sources disagree on quantities before the merge runs.

Step 5: Post-Merge Duplicate Check

Add a final validation to catch any duplicates that slip through:

-- Find duplicates in target table
SELECT SKU, COUNT(*) AS DuplicateCount
FROM InventoryWarehouse
GROUP BY SKU
HAVING COUNT(*) > 1;

If this returns any rows, your merge has a bug. In your case, this will likely show the current duplicates that need cleanup.

Step 6: Cleanup Existing Duplicates

Before implementing the new logic, fix existing duplicates:

WITH DuplicateSKUs AS (
  SELECT SKU, MAX(LastUpdated) AS KeepTimestamp
  FROM InventoryWarehouse
  GROUP BY SKU
  HAVING COUNT(*) > 1
)
DELETE iw
FROM InventoryWarehouse iw
INNER JOIN DuplicateSKUs d ON iw.SKU = d.SKU
WHERE iw.LastUpdated < d.KeepTimestamp;

This approach gives you:

  • Clear source priority (Manual > WMS > ERP)
  • Validation before merge prevents bad data
  • Conflict visibility for warehouse staff
  • Audit trail of all changes
  • Automatic cleanup of stale records

Implement these changes in your next ETL deployment and your inventory reports should show accurate, deduplicated records.

Your merge logic is missing a key component - you need to include the source system in your matching criteria. If you’re merging from three different sources every 4 hours, and they all have the same SKU, you’re creating multiple records because the MERGE statement can’t determine which source should win. You need a composite key that includes both SKU and SourceSystem, or you need to pre-aggregate your staging data before the merge.