Process mining import fails for large CSV files with data type issues

sandeep217 · April 3, 2025, 3:31am

I’m trying to import large CSV event logs (500K+ rows) into Power Platform process mining, but the import wizard keeps failing with data type mismatch errors. The CSV files contain process event data extracted from our legacy ERP system with timestamp, case ID, activity, and resource columns.

The import wizard validation fails on certain rows where timestamp formats are inconsistent or numeric case IDs are mixed with alphanumeric ones:


Row 45823: Invalid timestamp format
Expected: yyyy-MM-dd HH:mm:ss
Received: dd/MM/yyyy HH:mm:ss

This blocks our process discovery initiative. We need proper CSV data type standardization and potentially pre-processing scripts to clean the data before import. The import wizard validation doesn’t provide enough flexibility for our real-world data inconsistencies.

valentinatech · April 15, 2025, 9:51am

For a quick fix on existing files, use Excel Power Query. Load the CSV, add a custom column with this formula for timestamps: DateTime.From(Text.From([Timestamp]), “en-US”). For case IDs, use Text.From([CaseID]) to force everything to string. Then export back to CSV. Takes about 10 minutes for 500K rows.

pooja_solver · April 15, 2025, 6:05am

The dataflow approach sounds good for ongoing imports. But for this initial load, I need to get the data in quickly. What’s the fastest way to fix the timestamp and case ID issues in the existing CSV?

juan_tech · April 22, 2025, 8:52am

Here’s the comprehensive solution covering all three critical areas:

CSV Data Type Standardization: The process mining import wizard requires strict adherence to specific data types for each column. Your CSV must meet these exact specifications:

Timestamp: ISO 8601 format (yyyy-MM-dd HH:mm:ss) or Excel serial date format
Case ID: Consistent data type throughout (either all numeric or all alphanumeric string)
Activity: String, maximum 255 characters
Resource: String, can be null
Additional attributes: Must maintain consistent types per column

For immediate fixing of your existing 500K-row CSV, use this Python script:

import pandas as pd
df = pd.read_csv('events.csv')
df['Timestamp'] = pd.to_datetime(df['Timestamp'], errors='coerce')
df['CaseID'] = df['CaseID'].astype(str)
df.to_csv('events_cleaned.csv', index=False)

This handles mixed timestamp formats by attempting automatic conversion and standardizes all case IDs to strings. The ‘errors=coerce’ parameter converts invalid timestamps to NaT (Not a Time), which you can then filter out or fix manually.

For more complex scenarios with multiple date formats in the same column, use:

from dateutil.parser import parse
df['Timestamp'] = df['Timestamp'].apply(lambda x: parse(str(x), dayfirst=False))

Import Wizard Validation: The process mining import wizard performs several validation checks that you need to prepare for:

Column mapping validation: Ensure your CSV headers exactly match expected names (Case ID, Activity, Timestamp, Resource)
Mandatory fields check: Case ID, Activity, and Timestamp are required - no nulls allowed
Data type consistency: Every row must conform to the declared type
Duplicate detection: The wizard flags but doesn’t reject duplicates

To pass validation on the first attempt:

Remove or fill any null values in mandatory columns
Ensure timestamps are within a reasonable range (not year 1900 or 2099)
Verify case IDs don’t contain special characters that might be interpreted as delimiters
Check for hidden characters or encoding issues (use UTF-8 encoding)

Before importing, test with a 1000-row sample of your data. The wizard will show validation errors immediately, allowing you to fix issues before processing the full 500K rows.

Pre-Processing Scripts: For ongoing process mining operations, implement an automated data preparation pipeline:

Option 1 - Power Platform Dataflow (recommended for recurring imports):

Create a new dataflow in Power Platform
Add your CSV source (can connect to SharePoint, OneDrive, or Azure Blob)
Apply these transformation steps:
- Remove rows with null Case IDs or Timestamps
- Use ‘Change Type’ to enforce consistent data types
- Add custom column for timestamp standardization: `DateTime.From(Text.From([Timestamp]), “en-US”)
- Add custom column for case ID formatting: Text.PadStart(Text.From([CaseID]), 10, "0") (if you want fixed-length IDs)
Enable dataflow refresh on a schedule
Configure process mining to read from the dataflow output instead of raw CSV

Option 2 - Python Pre-Processing Script (for one-time or external processing):

Create a comprehensive cleaning script that handles all edge cases:

import pandas as pd
import numpy as np
from dateutil import parser

# Load with explicit encoding
df = pd.read_csv('raw_events.csv', encoding='utf-8-sig')

# Standardize timestamps
def clean_timestamp(ts):
    try:
        return parser.parse(str(ts))
    except:
        return np.nan

df['Timestamp'] = df['Timestamp'].apply(clean_timestamp)

# Remove invalid timestamps
df = df.dropna(subset=['Timestamp'])

# Standardize case IDs
df['CaseID'] = df['CaseID'].astype(str).str.strip()

# Clean activity names
df['Activity'] = df['Activity'].str.strip().str[:255]

# Export in process mining format
df.to_csv('events_processed.csv', index=False, date_format='%Y-%m-%d %H:%M:%S')

This script handles:

Multiple timestamp formats through intelligent parsing
Null value removal
String trimming and length limits
Consistent case ID formatting
Proper CSV encoding for import

For your 500K-row file, this Python approach processes in 30-60 seconds. Once you have the cleaned CSV, the process mining import wizard should complete without validation errors.

For future imports, I recommend the dataflow approach because it creates a reusable, scheduled pipeline that automatically cleans incoming data. You can also add data quality monitoring to track how many rows fail validation over time, helping you identify upstream data quality issues in your ERP system.

cynthia_381 · April 14, 2025, 12:16am

While external processing works, you can also use Power Platform dataflows to create a cleaning pipeline. Build a dataflow that ingests the raw CSV, applies transformations for timestamp standardization and case ID formatting, then outputs a cleaned version that feeds directly into process mining. This keeps everything within the Power Platform ecosystem and makes it repeatable. The dataflow can handle the 500K rows efficiently if you enable incremental refresh. Add data validation steps to flag problematic rows without failing the entire import - route them to an error table for manual review.

sharon_func · April 7, 2025, 3:05pm

That makes sense. Do you have a recommended approach for the pre-processing? Should I use Power Query within Power Platform or handle it externally before upload?

anita_api · April 7, 2025, 11:38am

Large CSV files with inconsistent data are a common problem. You’ll need to pre-process the CSV before importing. Use Python or Power Query to standardize timestamp formats and ensure case IDs are consistently formatted. The process mining import wizard expects strict data types and won’t handle mixed formats.

Topic		Views
Process mining event log import fails for large CSV files, causing incomplete process discovery Microsoft Power Platform question , data-quality , process-mining , process-modeling , csv-import , file-size-limit , process-discovery , powerplat-2024-wave-1 , event-log-import	6	September 9, 2025
Event log import fails in process mining due to date format Pega Platform question , process-mining , process-analytics , csv , workflow-design , data-validation , import-error , date-format , pega-8-7	7	March 16, 2025
Process mining event log import fails due to invalid timestamp format ServiceNow question , data-quality , process-mining , process-modeling , import-error , event-log , snow-san-d , timestamp-format , csv-preprocessing	6	October 20, 2025
Event log import fails for large CSV files in process mining Creatio question , performance-opt , process-mining , workflow-design , import-timeout , csv-processing , server-config , analysis-blocked , creatio-8-5	5	November 18, 2025
Process mining event log import fails with large files, blocking workflow discovery Microsoft Power Platform question , performance-opt , process-mining , workflow-design , data-import , event-logs , powerplat-2024-wave-1 , workflow-discovery , memory-limits	3	December 3, 2024
Process mining CSV import fails when special characters present in event logs Appian question , data-quality , data-integration , process-mining , csv-import , special-characters , event-logs , encoding-issues , appian-22-3	5	February 24, 2025
Process mining event log import fails with 'Invalid timestamp format' on custom CSV files ServiceNow question , process-mining , low-code-dev , snow-tokyo , data-import , encoding , csv-format , timestamp-validation , event-log-import	6	September 18, 2025
Process mining event log import fails due to missing case ID Mendix question , data-quality , xml , process-mining , process-modeling , csv-import , mendix-9-24 , event-log , case-id	3	April 25, 2025
Process mining event log import fails for large CSV files with memory errors OutSystems question , performance , process-mining , batch-processing , workflow-design , csv-import , data-import , memory-optimization , outsystems-process-mining	4	March 12, 2025

Process mining import fails for large CSV files with data type issues

Related topics