Process mining import fails on large XES files with out-of-memory error

I’m hitting a critical roadblock with process mining in Mendix 10.6. When trying to import large XES event log files (around 500MB+), the import consistently fails with out-of-memory errors. Smaller files under 100MB import fine, but our production event logs are significantly larger.

The error occurs during the XES file parsing phase, before the data even gets written to the database. Looking at the cloud metrics, memory usage spikes to the container limit and then the import microflow crashes.

Here’s the error from the logs:


java.lang.OutOfMemoryError: Java heap space
at XESParser.parseEventLog(XESParser.java:234)
at ProcessMining.ImportXES(ImportXES.java:89)

We’re running on Mendix Cloud with the standard memory allocation. The XES files contain detailed event data from our ERP system - about 2 million events per file. I understand XES parsing loads the entire document into memory for validation, but there must be a way to handle larger files. Has anyone successfully imported large XES files, and what approach did you use?

Franz makes a good point about the connector module. Also, even with streaming, you’ll need adequate memory for the database operations that follow parsing. With 2 million events, the commit operations alone will consume significant memory. Consider implementing batch commits - process and commit events in batches of 10,000 rather than committing everything at once after parsing completes.

First question - what’s your current cloud environment memory allocation? Standard plans might not provide enough heap space for 500MB XES files. You might need to upgrade to a plan with higher memory limits or request a custom memory configuration.

From what I’ve seen, the standard XES import in Mendix does load the full file for validation before processing. For large files, you have a few options: split the XES file into smaller chunks, implement a custom streaming parser using Java actions, or pre-process the XES file to extract only essential attributes before import. We went with the chunking approach - split our large event logs into monthly segments before importing.

We’re on a medium cloud plan with 2GB memory allocation. I can request an upgrade, but I’m wondering if there’s a more efficient approach than just throwing more memory at the problem. The XES standard supports streaming parsers - does Mendix’s process mining module use streaming, or does it load everything into memory?

I’ve dealt with this exact scenario for a manufacturing client with massive event logs. Here’s a comprehensive solution covering all three critical aspects:

XES File Parsing Optimization: The default Mendix process mining import uses DOM-based XML parsing which loads the entire XES structure into memory. For files over 200MB, this is problematic. Implement a streaming SAX parser using a custom Java action:

SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
XESStreamHandler handler = new XESStreamHandler(context);
parser.parse(xesInputStream, handler);

This processes the XES file sequentially, maintaining only the current event in memory rather than the entire log structure.

Cloud Memory Allocation Strategy: Even with streaming, you need adequate heap space. For 500MB XES files with 2M events, I recommend:

  • Minimum 4GB memory allocation for the container
  • Configure JVM heap with -Xmx3072m to leave room for non-heap operations
  • Request custom memory settings through Mendix support if standard plans don’t offer sufficient allocation

Monitor memory usage during import using the Mendix Cloud metrics. If you see sustained usage above 80%, increase allocation before attempting production imports.

Streaming Import Logic Implementation: The key is batch processing with progressive commits. Create a custom import microflow that:

  1. Opens XES file as stream (don’t load entire file)
  2. Parse events in batches of 5,000-10,000
  3. For each batch:
    • Create ProcessEvent entities
    • Commit batch to database
    • Clear entities from memory
    • Update progress indicator

Pseudocode structure:


// Streaming import with batching:
1. Initialize XES stream parser with file handle
2. Set batch size = 10,000 events
3. While events remain in stream:
   a. Read next batch of events into memory
   b. Transform to ProcessEvent entities
   c. Commit batch to database
   d. Call System.gc() to suggest garbage collection
   e. Clear batch list
4. Close stream and finalize import

This approach keeps memory usage stable throughout the import regardless of total file size.

Additional Optimizations:

  • Disable validation during bulk import (validate file structure before starting import)
  • Use asynchronous import with progress tracking so users don’t block on the operation
  • Implement resume capability - store import checkpoint so failed imports can restart from last committed batch
  • Consider pre-filtering events at the source system to reduce XES file size (remove debug/verbose events)

After implementing streaming with batched commits, we successfully imported XES files up to 2GB (8M+ events) on a 4GB memory allocation without failures. Import time increased (streaming is slower than bulk loading) but reliability improved dramatically. The key insight is that process mining doesn’t require the entire event log in memory simultaneously - only the current processing batch needs to be resident.

Yuki’s chunking suggestion works but isn’t ideal for process mining since you lose cross-boundary process flows. I’d recommend looking into the streaming import approach. You can implement a custom Java action that uses SAX parsing instead of DOM parsing for XES files. This processes the XML in chunks without loading the entire document. It’s more work upfront but scales much better for production event logs. Have you considered using the Mendix Process Mining Connector module? It might have better memory handling.