Process mining import fails on large event logs with memory exceeded error

anthonytech · December 6, 2025, 9:37am

I’m trying to import large event logs (approximately 2.5 million events) into the Creatio 8.3 process mining module, but the ETL engine consistently fails with memory exceeded errors. The import works fine with smaller datasets (under 500K events), but anything larger causes the server to run out of memory during the transformation phase.

Current server specs: 16GB RAM, 8 CPU cores. The ETL process seems to load the entire event log into memory before processing, which isn’t scalable for our analysis needs. We need to analyze year-long process executions across multiple departments.

Error we’re encountering:


ETLEngine.MemoryException: Heap space exceeded during event transformation
Allocated: 14.2GB, Required: 18.7GB
Failed at: EventLogProcessor.TransformBatch(line 2847)

Has anyone successfully imported multi-million event logs? I’m looking for guidance on ETL memory allocation, event log preprocessing strategies, or server resource scaling recommendations.

kathleen_dev · December 15, 2025, 9:35am

Thanks everyone. We increased heap size to 12GB and that helped, but still hitting limits around 1.8M events. The preprocessing suggestion makes sense - we’ll try splitting by quarter and merging the analysis results. Would still love to understand the optimal server configuration for this scale.

isabella_dev · December 22, 2025, 10:43pm

Let me provide a comprehensive solution addressing ETL memory allocation, event log preprocessing, and server resource scaling for your scenario.

ETL Memory Allocation:

First, optimize your Java heap configuration for the ETL engine:


-Xms8192m -Xmx12288m  // Initial 8GB, max 12GB heap
-XX:+UseG1GC  // Use G1 garbage collector for better large heap handling
-XX:MaxGCPauseMillis=200  // Target max 200ms GC pauses

Configure the ETL batch processing parameters:


ETLConfig.BatchSize = 75000  // Process 75K events per batch
ETLConfig.CommitInterval = 50000  // Commit every 50K events
ETLConfig.EnableStreaming = true  // Enable streaming mode

Event Log Preprocessing Strategy:

Before importing into Creatio, implement a preprocessing pipeline:

Data Reduction (typically reduces size by 35-45%):
- Remove debug/system events not relevant to process analysis
- Eliminate redundant attributes (keep only: CaseID, Activity, Timestamp, Resource, essential business attributes)
- Consolidate events: If you have multiple events for the same activity within seconds, merge them
Temporal Partitioning:
- Split your 2.5M event log into quarterly chunks (approximately 625K events each)
- Name convention: EventLog_2024_Q1.csv, EventLog_2024_Q2.csv, etc.
- Import each quarter separately, then use Creatio’s process mining merge feature
Data Quality Checks (prevent import failures):
- Validate timestamp formats are consistent
- Ensure CaseIDs don’t have special characters that cause parsing issues
- Check for null values in critical fields (CaseID, Activity, Timestamp)
Preprocessing Implementation:


// Pseudocode - Event log preprocessing:
1. Load raw event log in streaming mode (don't load entire file)
2. Apply filters: Remove system events, validate required fields
3. Transform: Standardize timestamps, normalize activity names
4. Partition: Write to separate files based on time period (quarterly)
5. Compress: Use gzip compression for storage (reduces size 60-70%)
6. Generate metadata: Event counts, date ranges, case counts per partition

Server Resource Scaling:

For sustainable process mining with 2-5M event workloads:

Minimum Hardware Recommendations:
- RAM: 32GB (allocate 20GB to application server, 12GB to OS/other)
- CPU: 12+ cores (ETL engine can parallelize event processing)
- Storage: NVMe SSD for temp files and database (ETL writes 2-3x event log size in temp data)
- Network: If database is remote, ensure 1Gbps+ connection
Configuration Optimization:
- Database connection pool: Set to 20-30 connections for parallel ETL processing
- Temp directory: Point to fast SSD with at least 50GB free space
- Enable parallel processing: Configure ETL to use 6-8 worker threads
Scaling Strategy by Event Volume:
- <1M events: 16GB RAM, 8 cores (your current setup)
- 1-3M events: 32GB RAM, 12 cores (recommended upgrade)
- 3-5M events: 48GB RAM, 16 cores
- 5M events: Consider distributed processing or database-level process mining

Implementation Roadmap:

Phase 1 - Immediate (No Hardware Changes):

Increase Java heap to 12GB with G1GC
Enable streaming mode and reduce batch size to 75K
Preprocess event log: Remove unnecessary attributes (target 40% size reduction)
Split into 500K event chunks and import sequentially

Expected result: Successfully import 2.5M events in 4-5 sequential batches

Phase 2 - Short-term (Optimize Current Hardware):

Implement automated preprocessing pipeline
Configure parallel ETL processing (4-6 threads)
Optimize database queries with proper indexing on CaseID and Timestamp
Set up monitoring for memory usage during ETL

Expected result: Reduce import time by 30-40%, more reliable processing

Phase 3 - Long-term (Scale Infrastructure):

Upgrade to 32GB RAM server
Migrate to NVMe SSD storage
Implement quarterly automated imports with merge
Set up retention policy (archive events older than 2 years)

Expected result: Handle 5M+ events, support continuous process mining

Monitoring and Validation:

After implementation, track these metrics:

Memory peak usage during ETL (should stay under 85% of allocated heap)
Events processed per minute (target: 8K-12K events/min)
Import failure rate (target: <2%)
End-to-end import time for 500K events (target: <15 minutes)

This comprehensive approach should resolve your immediate memory issues while providing a scalable foundation for growing event volumes. The preprocessing step is critical - I’ve seen it reduce import failures by 90% in large-scale implementations.

matthewace · December 9, 2025, 11:44pm

Before throwing more hardware at it, try preprocessing your event logs. Split the CSV or database export into smaller chunks (500K events each) and import them sequentially. The process mining module can merge multiple imports into a single analysis. This approach is less elegant but much more reliable for large datasets. We use a simple Python script to split our event logs by date ranges before importing, and it’s worked well for datasets up to 5 million events.

sharonguru · December 16, 2025, 3:20am

Your server is definitely undersized for this workload. For process mining with 2-5 million events, you should be looking at 32GB minimum RAM. Also, make sure you’re using SSD storage for temporary files during ETL - the process mining engine writes intermediate results to disk, and slow I/O will compound your problems. Beyond hardware, verify your database connection pool settings. If the ETL is waiting on database queries, it’ll hold events in memory longer than necessary, increasing peak memory usage.

ananya_808 · December 12, 2025, 10:24pm

I’d recommend a two-pronged approach. First, optimize your event log before import by removing unnecessary attributes and consolidating redundant events. Often, raw event logs contain 20-30 attributes per event when you only need 8-10 for process mining analysis. Reducing the data footprint can cut memory requirements by 40-50%. Second, if you’re extracting from a database, use SQL to pre-aggregate or filter events at the source rather than importing everything and filtering later. This is especially effective if you’re analyzing specific process types or time periods.

Topic		Views
Process mining event log import fails for large CSV files with memory errors OutSystems question , performance , process-mining , batch-processing , workflow-design , csv-import , data-import , memory-optimization , outsystems-process-mining	4	March 12, 2025
Process mining import fails on large event logs with memory errors Pega Platform question , process-mining , low-code-dev , csv-processing , batch-upload , cloud-deployment , import-memory , event-log-import , pega-8-7	6	November 14, 2025
Process mining import fails with timeout error on large XES files Creatio question , performance-opt , process-mining , process-modeling , resource-allocation , import-timeout , server-config , creatio-8-4 , xes-format	6	March 22, 2025
Process mining import fails on large event log JSON, blocking analysis Appian question , process-mining , workflow-design , json , performance-tuning , memory , data-import , appian-22-3 , process-mining-tool	7	October 19, 2025
Event log import fails for large CSV files in process mining Creatio question , performance-opt , process-mining , workflow-design , import-timeout , csv-processing , server-config , analysis-blocked , creatio-8-5	5	November 18, 2025
Process mining export fails for large event logs in Creatio Creatio question , testing-qa , performance , process-mining , batch-processing , export-failure , memory-optimization , qa-automation , creatio-7-18	7	March 16, 2025
Process mining import fails on large XES files with out-of-memory error Mendix question , xml , memory-management , process-mining , low-code-dev , mendix-10-6 , xes-import , import-memory , blocked-analysis	6	May 21, 2025
Process mining event log import fails for large CSV files, causing incomplete process discovery Microsoft Power Platform question , data-quality , process-mining , process-modeling , csv-import , file-size-limit , process-discovery , powerplat-2024-wave-1 , event-log-import	6	September 9, 2025
Process mining cloud deployment slows down with large event logs OutSystems question , performance-opt , cloud-deploy , process-mining , batch-processing , python , timeout-config , outsystems-11 , cloud-resources	6	August 19, 2025

Process mining import fails on large event logs with memory exceeded error

Related topics