Had the same issue last year with our ANSYS integration. The problem is multi-faceted and requires addressing all three areas you mentioned.
JVM Heap Tuning:
Your current approach is counterproductive. Use these optimized settings:
-Xms10G -Xmx10G
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:G1HeapRegionSize=32M
-XX:InitiatingHeapOccupancyPercent=45
Fixed heap size (Xms=Xmx) prevents resize overhead. G1GC with 32M regions handles large simulation objects efficiently. The 45% threshold triggers concurrent marking earlier, preventing full GC surprises.
Batch Size Configuration:
Reduce to 10-15 files per batch maximum. Simulation data is fundamentally different from CAD data - each file contains dense numerical arrays. Implement this pattern:
// Pseudocode - Optimized batch processing:
1. Initialize batch with maxSize=10, currentSize=0
2. For each simulation file in queue:
3. Load file metadata only (not full content)
4. If currentSize + estimatedSize > threshold: commit batch, start new
5. Process file with streaming API (chunk size 5MB)
6. Explicitly clear object references after import
7. If currentSize % 5 == 0: force minor GC suggestion
8. Commit final batch and cleanup temp resources
GC Log Analysis:
Enable comprehensive logging:
-Xloggc:/opt/teamcenter/logs/gc.log
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=5
-XX:GCLogFileSize=20M
Analyze for these patterns:
- Old Gen growth rate: Should be < 100MB/minute during imports
- Full GC frequency: Should be < 1 per hour
- Pause times: Target < 500ms for young GC, < 2s for mixed GC
If Old Gen grows linearly without dropping, you have a true memory leak. Use -XX:+HeapDumpOnOutOfMemoryError to capture dumps for analysis.
Additional Critical Fix:
Modify your SimDataImporter to use WeakReferences for cached data and implement proper resource cleanup:
try {
processSimulationData(file);
} finally {
clearCaches();
closeStreams();
System.gc(); // Suggest only after batch
}
After implementing these changes, we went from crashes every 6 hours to stable 24/7 operation processing 800+ files daily. Our heap usage stabilized at 55-65% with GC pauses under 300ms. Monitor for 48 hours and adjust batch size if needed - some ANSYS CFD files are exceptionally large and may need batch size of 5.