Event log quality blocking our process mining rollout – where to start?

anilcloud · August 19, 2025, 7:30pm

We’ve been trying to get process mining off the ground for our order-to-cash workflows, but we keep hitting walls with the event logs themselves. Our data team pulled activity tables from the ERP, but when the process mining tool loaded them, we’re seeing case durations showing up as decades instead of days. Turns out we’ve got placeholder timestamps like 1970-01-01 or 2100-12-31 scattered throughout, probably from old migrations or incomplete records.

Beyond the timestamp mess, we’re also finding that some orders have multiple IDs depending on which system touched them—CRM uses one identifier, procurement uses another, and finance has its own. So what should be a single process instance ends up looking like three separate cases. We’ve also got duplicate records where the same activity appears twice with slightly different timestamps, which is throwing off our bottleneck analysis.

I know we need to clean this up before we can get reliable insights, but I’m not sure where to prioritize. Should we start with the timestamp problems, tackle the case ID mapping first, or focus on deduplication? And how do other teams handle this kind of data prep—are you doing it manually, or is there tooling that helps automate the cleanup? Would love to hear what’s worked (or hasn’t) for folks who’ve been through this.

ines_pgm · September 7, 2025, 7:30pm

Make sure you document everything you’re doing. We built a data dictionary that defines every field in our event logs—what it means, where it comes from, what transformations we apply, and what quality rules we enforce. When someone questions the analysis six months later, you need to be able to explain exactly how the data was prepared. Also, if you’re in a regulated industry, you’ll need that documentation for compliance. Data governance isn’t glamorous, but it’s what keeps these initiatives from falling apart when the original team moves on.

meeradev · September 2, 2025, 7:30pm

We had to deal with zero timestamps too. What helped us was profiling the event log first—running statistical checks to see how many cases were affected, which activities had the problem, and whether there was a pattern (e.g., always the same activity or always from a specific system). Once we knew the scope, we could decide on remediation. In our case, about 5% of cases had zero timestamps, so we just excluded those cases entirely rather than trying to impute values or keep partial data. If your percentage is higher, you might need a more sophisticated approach.

kyle_wiz · August 20, 2025, 7:30pm

Timestamps first, in my experience. If your temporal ordering is wrong, everything downstream gets unreliable—activities appear in the wrong sequence, and your process model ends up incorrect. We had a similar issue where migration artifacts left us with 1900-01-01 dates. We wrote a validation script to flag any timestamp outside a reasonable range (say, past five years to next year) and then decided case-by-case whether to remove those events or the entire case. If only a few cases are affected, just drop them. If it’s widespread, you might need to remove only the bad events and keep the rest of the case intact.

tarunstrat · September 12, 2025, 7:30pm

One thing that’s helped us scale is treating event log preparation as a proper data pipeline rather than a one-time cleanup. We extract raw data into a staging area, run validation and transformation logic, and only promote clean data to the process mining layer. That way, when source systems change or new data quality issues appear, we can catch them early and fix them in the pipeline rather than polluting the analysis. We also version the pipeline so we can reproduce historical analyses if needed. It’s more upfront investment, but it pays off when you’re running this continuously.

apiflow · August 27, 2025, 7:30pm

We automate a lot of this with an ETL pipeline that runs nightly. For duplicates, we hash the combination of case ID, activity name, and timestamp (rounded to the nearest minute) and drop any exact matches. For timestamps, we convert everything to UTC during extraction and flag any values outside a configurable valid range. For case IDs, we maintain a reference table in our data warehouse that maps cross-system identifiers. It’s not perfect, but it catches most problems before they hit the process mining tool. Initial setup took a few weeks, but now it’s mostly hands-off.

Topic		Views
Event log quality blocking process mining—how are you handling it? AI Adoption in BPM discussion , data-quality , process-mining , ai-adoption , exploring , bpm-ai , event-logs , timestamp-management , case-identification	3	August 19, 2025
Cleaning up event logs for process mining in order-to-cash AI Adoption in BPM use-case , data-governance , erp-integration , process-mining , ai-adoption , piloting , bpm-ai , event-log-quality	5	August 15, 2025
Preparing event logs for process mining vs automation: best practices Appian discussion , data-quality , aml , process-mining , process-analytics , process-modeling , appian-23-2 , event-log-export , automation-integration	6	March 6, 2025
Process mining: Data quality vs volume-how do you optimize for finance workflow analytics? ServiceNow discussion , data-quality , audit , compliance , data-integration , process-mining , process-analytics , snow-san-diego , finance-workflows	3	November 30, 2024
Process mining event log import fails due to missing case ID Mendix question , data-quality , xml , process-mining , process-modeling , csv-import , mendix-9-24 , event-log , case-id	3	April 25, 2025
Event log import fails in process mining due to date format Pega Platform question , process-mining , process-analytics , csv , workflow-design , data-validation , import-error , date-format , pega-8-7	7	March 16, 2025
Automated process mining data pipeline for order-to-cash vis Mendix use-case , rest-api , data-integration , process-mining , order-to-cash , event-logs , mendix-9-18 , etl-microflow , dashboard-visualization	7	May 26, 2025
Process mining import fails for large CSV files with data type issues Microsoft Power Platform question , data-integration , process-mining , python , data-validation , csv-import , data-types , event-log , powerplat-2025-wave-1	6	April 7, 2025
Process mining event log import fails due to invalid timestamp format ServiceNow question , data-quality , process-mining , process-modeling , import-error , event-log , snow-san-d , timestamp-format , csv-preprocessing	6	October 20, 2025

Event log quality blocking our process mining rollout – where to start?

Related topics