Here’s a comprehensive solution that addresses ETL merge logic, deduplication with SQL, and inventory data validation for your multi-source scenario.
Step 1: Implement Staging Validation and Deduplication
Before any merge, validate and deduplicate your staging data:
-- Create validated staging with source priority
WITH RankedInventory AS (
SELECT
SKU, Quantity, SourceSystem, LoadTimestamp,
ROW_NUMBER() OVER (
PARTITION BY SKU
ORDER BY
CASE SourceSystem
WHEN 'MANUAL' THEN 1 -- Highest priority
WHEN 'WMS' THEN 2
WHEN 'ERP' THEN 3
END,
LoadTimestamp DESC
) AS SourceRank
FROM InventoryStaging
WHERE Quantity >= 0 -- Validation rule
)
SELECT SKU, Quantity, SourceSystem, LoadTimestamp
INTO #ValidatedStaging
FROM RankedInventory
WHERE SourceRank = 1; -- Keep only highest priority per SKU
Step 2: Revised ETL Merge Logic
Update your merge to use the validated staging and track source lineage:
MERGE InventoryWarehouse AS target
USING #ValidatedStaging AS source
ON target.SKU = source.SKU
WHEN MATCHED AND source.LoadTimestamp > target.LastUpdated THEN
UPDATE SET
Quantity = source.Quantity,
SourceSystem = source.SourceSystem,
LastUpdated = source.LoadTimestamp,
UpdateCount = target.UpdateCount + 1
WHEN NOT MATCHED THEN
INSERT (SKU, Quantity, SourceSystem, LastUpdated, UpdateCount)
VALUES (source.SKU, source.Quantity, source.SourceSystem,
source.LoadTimestamp, 1)
WHEN NOT MATCHED BY SOURCE AND target.LastUpdated < DATEADD(day, -7, GETDATE()) THEN
DELETE; -- Remove stale records
Key improvements:
- Only updates when source data is newer
- Tracks which system provided the data
- Counts updates for audit trail
- Removes obsolete SKUs
Step 3: Inventory Data Validation Rules
Create a pre-merge validation procedure:
-- Pseudocode for validation checks:
1. Check for negative quantities (flag as error)
2. Identify SKUs with quantity changes > 50% (flag for review)
3. Find SKUs present in multiple sources with conflicting values
4. Validate SKU exists in master product catalog
5. Check for unrealistic quantities (e.g., > max_capacity)
6. Log all validation failures to ValidationLog table
Step 4: Conflict Resolution Reporting
Create a view that shows source conflicts before merge:
CREATE VIEW vw_InventoryConflicts AS
SELECT
SKU,
MAX(CASE WHEN SourceSystem='WMS' THEN Quantity END) AS WMS_Qty,
MAX(CASE WHEN SourceSystem='ERP' THEN Quantity END) AS ERP_Qty,
MAX(CASE WHEN SourceSystem='MANUAL' THEN Quantity END) AS Manual_Qty,
COUNT(DISTINCT Quantity) AS ConflictCount
FROM InventoryStaging
GROUP BY SKU
HAVING COUNT(DISTINCT SourceSystem) > 1
AND COUNT(DISTINCT Quantity) > 1;
This shows where sources disagree on quantities before the merge runs.
Step 5: Post-Merge Duplicate Check
Add a final validation to catch any duplicates that slip through:
-- Find duplicates in target table
SELECT SKU, COUNT(*) AS DuplicateCount
FROM InventoryWarehouse
GROUP BY SKU
HAVING COUNT(*) > 1;
If this returns any rows, your merge has a bug. In your case, this will likely show the current duplicates that need cleanup.
Step 6: Cleanup Existing Duplicates
Before implementing the new logic, fix existing duplicates:
WITH DuplicateSKUs AS (
SELECT SKU, MAX(LastUpdated) AS KeepTimestamp
FROM InventoryWarehouse
GROUP BY SKU
HAVING COUNT(*) > 1
)
DELETE iw
FROM InventoryWarehouse iw
INNER JOIN DuplicateSKUs d ON iw.SKU = d.SKU
WHERE iw.LastUpdated < d.KeepTimestamp;
This approach gives you:
- Clear source priority (Manual > WMS > ERP)
- Validation before merge prevents bad data
- Conflict visibility for warehouse staff
- Audit trail of all changes
- Automatic cleanup of stale records
Implement these changes in your next ETL deployment and your inventory reports should show accurate, deduplicated records.