Your 8-12% discrepancy points to systematic issues across all four critical areas. Here’s how to resolve this comprehensively.
ETL Transformation Validation: Implement explicit validation checkpoints in your ETL pipeline. After each transformation step, calculate record counts and sum totals, then compare against source data:
-- Validation checkpoint query
SELECT
COUNT(*) as record_count,
SUM(headcount) as total_headcount,
SUM(budget_amount) as total_budget
FROM staging_workforce_forecast
WHERE load_date = CURRENT_DATE
Store these validation metrics and alert when variance exceeds 0.5%. This immediately identifies which transformation step introduces discrepancies.
Timestamp Synchronization: The timezone issue is critical. Dayforce workforce planning uses UTC timestamps for all forecast effective dates, but also includes a separate timezone field for display purposes. Your ETL must handle this correctly:
SELECT
employee_id,
forecast_period,
CONVERT_TZ(effective_start_date, 'UTC', 'UTC') as start_date_utc,
CONVERT_TZ(effective_end_date, 'UTC', 'UTC') as end_date_utc
FROM dayforce_api_extract
Never convert to local time during extraction - keep UTC throughout and only convert in the presentation layer.
Data Reconciliation Queries: Implement automated reconciliation that runs after each ETL cycle. This query identifies missing and mismatched records:
-- Pseudocode - Reconciliation process:
1. Extract source row count and checksums from Dayforce API
2. Compare with staging table counts after extraction
3. Identify records in source but missing in staging (extraction gaps)
4. Compare transformed values against source (transformation errors)
5. Generate discrepancy report with specific record IDs and field-level differences
Log all discrepancies with full context for investigation.
Filter Condition Mapping: Dayforce’s workforce planning API uses complex filter logic that must be replicated exactly. The ‘active forecasts’ filter includes multiple status conditions:
WHERE (status = 'APPROVED' OR (status = 'DRAFT' AND approved_for_planning = true))
AND effective_end_date >= CURRENT_DATE
AND is_superseded = false
AND is_deleted = false
Your original query was missing the is_superseded and approved_for_planning conditions, which likely accounts for most of your 8-12% variance. These records exist in Dayforce’s planning calculations but were being excluded from your warehouse.
Additionally, implement a daily reconciliation dashboard that shows: source record count, warehouse record count, variance percentage, and drill-down capability to see which specific forecasts are missing or different. We run this every morning at 6 AM after the nightly ETL completes, and it catches data quality issues before business users see them.
For your specific case with Q3 showing 2,847 vs 2,612 (235 record difference), I’d bet the missing 235 records are either draft forecasts approved for planning or records marked as superseded that Dayforce still includes in aggregate calculations. Add those filter conditions and your numbers should align within 1-2%.