Bulk data import job intermittently drops records in workflow automation

We’re experiencing intermittent record loss during scheduled bulk imports via Experience Platform workflow automation. The workflow processes customer data files (typically 50-80K records) every 6 hours, but we’re seeing random gaps - sometimes 200-300 records just vanish without errors in the workflow logs.

The workflow shows successful completion status, but downstream validation reveals missing customer records. We’ve checked file size limits in the workflow configuration, but can’t find documentation on hard limits for bulk imports. The logging level is set to INFO, which might not capture silent data truncation issues.

This creates significant customer data gaps affecting our marketing campaigns. Has anyone dealt with similar silent failures in AEC 2021 bulk import workflows? Need guidance on proper logging configuration and validation checkpoints.

For proper visibility, you need to enable DEBUG level logging in your workflow configuration and implement pre-ingestion row count validation. The workflow should compare source file row counts against ingested record counts. Also, Experience Platform’s Data Ingestion API provides batch status endpoints that show detailed failure reasons - integrate those checks into your workflow error handling. Without explicit validation logic, silent failures will continue because the workflow considers partial ingestion as success.

Check your workflow’s batch processing configuration. In AEC 2021, the default batch size is 10,000 records, and if your workflow isn’t properly chunking larger files, you’ll get partial ingestion without clear error messages. Also verify that your dataset schema doesn’t have strict validation rules that could silently reject records without logging failures at INFO level.

Thanks for the insights. I checked the batch processing settings - we’re using default chunking. The schema does have some required fields, but I’d expect validation errors to show up somewhere. How do I enable more detailed logging to catch these silent truncations? And what’s the recommended approach for downstream validation checkpoints?

We had nearly identical issues in our AEC 2021 implementation. The problem stems from multiple factors working together, and you need to address all four focus areas systematically.

Bulk Import File Size Limits: Experience Platform has a 100MB per file recommendation, but the real constraint is memory allocation per workflow execution. Files with complex nested schemas can hit memory limits well before 100MB. Split your 50-80K record files into smaller batches of 20-25K records maximum. This prevents memory-related silent truncation.

Workflow Logging Configuration: Change your logging level from INFO to DEBUG in the workflow settings. Add explicit logging statements at key checkpoints: pre-ingestion row count, post-validation count, and failed record count. Enable Data Ingestion API audit logs through Platform’s monitoring interface - this captures rejection reasons that workflow logs miss.

Silent Data Truncation: This happens when schema validation fails without throwing exceptions. Add a pre-ingestion validation step using Platform’s Schema Registry API to validate each record against your dataset schema before batch submission. Records failing validation should be logged to a separate error dataset with rejection reasons. Implement a row count reconciliation check: source_count == ingested_count + rejected_count.

Downstream Data Validation: Implement a post-ingestion validation workflow that runs 15 minutes after each bulk import. Query the target dataset for the batch ID and compare record counts. Set up alerts when discrepancies exceed 1%. Create a reconciliation report showing: source file name, expected count, ingested count, missing count, and sample missing record IDs. This provides audit trail for data governance.

Also configure your workflow’s error handling to treat partial success as failure. Use the batch ingestion status API endpoint to verify complete ingestion before marking workflow as successful. We reduced our data loss from 2-3% to under 0.01% after implementing these changes.

I’ve seen this behavior before. Experience Platform has default file size thresholds that aren’t always obvious. For batch ingestion, there’s a 100MB per file soft limit, and records can get silently dropped if individual batches exceed memory allocation during processing. Your 50-80K records might be hitting edge cases depending on record complexity and schema size.

One thing that caught us was network timeout settings during large batch uploads. If your workflow uses REST API calls for ingestion and the connection times out mid-upload, some platforms will accept partial data without flagging it as an error. Check your API timeout configurations and consider implementing resume-on-failure logic with batch identifiers to track partial uploads.