Db2 AI model training fails with 'insufficient memory' error during large dataset ingestion in ic-2019 database module

larry_dev · April 26, 2025, 4:35pm

I’m hitting a wall trying to train a machine learning model directly in Db2 using the AI functions. The training process consistently fails with an ‘insufficient memory’ error after processing about 60% of my dataset (approximately 2.5 million rows).

The error occurs during the model fitting phase:


SQL20478N Insufficient memory to complete operation
SQLSTATE=57011

I’ve checked the Db2 AI memory requirements in the documentation, but my bufferpool configuration seems adequate at 50GB. The dataset is partitioned across 8 tables by date range, and I’m using a UNION ALL view for training. Is there a specific bufferpool or memory setting I’m missing for AI workloads? The model type is a random forest classifier with 100 trees.

linda_wizard · April 29, 2025, 12:57am

The 50GB bufferpool might not be the issue here. Db2 AI functions require additional memory allocation beyond standard bufferpools. Check your database configuration for the sheapthres_shr parameter - this controls shared sort heap memory which AI training heavily uses. You’ll likely need to increase it significantly for large datasets. Also verify your sortheap setting is appropriate for the model complexity.

thomasarchitect · May 2, 2025, 1:51am

Are you monitoring memory usage during the training process? I’d recommend checking the actual memory consumption patterns. With 2.5M rows and random forest with 100 trees, you’re dealing with substantial memory overhead. The UNION ALL view might also be causing issues - Db2 needs to materialize portions of that view during training. Have you considered training on a single partition first to validate memory requirements?

angelaanalyst · May 7, 2025, 5:40am

Your analysis is correct. For Db2 AI model training at scale, you need to address three key areas. First, dataset partitioning strategy - instead of UNION ALL, consider training separate models per partition and using ensemble techniques. Second, increase your shared sort heap substantially. For your dataset size, I’d recommend setting sheapthres_shr to at least 40GB. Third, review your bufferpool allocation - AI functions benefit from a dedicated bufferpool for temporary storage during training iterations.

scott_dev · May 6, 2025, 4:34pm

Good point about the UNION ALL view. I checked sheapthres_shr and it’s set to 10GB. That does seem low for AI workloads. I tried training on a single partition (about 300K rows) and it completed successfully, which confirms the memory scaling issue. The single partition training used approximately 8GB based on monitoring. So extrapolating to the full dataset, I’d need significantly more than 10GB shared sort heap.

john_expert · May 11, 2025, 7:25pm

Also worth checking if you’re using automatic memory management or manual. With automatic, Db2 might not be allocating enough memory dynamically for AI workloads. You might want to switch to manual memory management for better control over AI training resources. Set self_tuning_mem to OFF and manually configure memory parameters based on your workload characteristics.

helen_builder · May 23, 2025, 4:52am

Let me provide a comprehensive solution addressing all three critical areas for your Db2 AI memory issue.

Db2 AI Memory Requirements: For your workload (2.5M rows, random forest with 100 trees), you need to allocate memory across multiple configuration parameters. The base calculation: 2.5M rows × estimated 50 features × 8 bytes × 100 trees × 2 = approximately 200GB theoretical maximum. However, Db2 AI uses optimized algorithms, so actual requirements are lower.

Bufferpool Configuration: Create a dedicated bufferpool for AI operations:


CREATE BUFFERPOOL AI_BP SIZE 40000 PAGESIZE 32K
ALTER TABLESPACE TEMPSPACE1 BUFFERPOOL AI_BP

Your existing 50GB bufferpool should remain for regular operations. The dedicated AI bufferpool prevents contention.

Critical Memory Parameters:


UPDATE DBM CFG USING SHEAPTHRES_SHR 45000
UPDATE DB CFG FOR yourdb USING SORTHEAP 8192
UPDATE DB CFG FOR yourdb USING UTIL_HEAP_SZ 10000

Dataset Partitioning Strategy: Instead of UNION ALL across 8 partitions, implement incremental training:

Train base model on first partition (most recent data)
Use Db2’s incremental learning capability to update model with subsequent partitions
This approach reduces peak memory by processing partitions sequentially

Alternatively, reduce model complexity: decrease trees from 100 to 50, or implement feature selection to reduce dimensionality. Monitor with db2pd -edus during training to track actual memory consumption. After implementing these changes, restart the instance for DBM configuration changes to take effect. Your training should complete successfully with these optimizations addressing memory allocation, bufferpool segregation, and partitioning strategy.

brenda_data · May 17, 2025, 4:47pm

I ran into this exact scenario last quarter. Here’s what you need to verify - the memory estimate for random forest training is roughly (number_of_rows × number_of_features × 8 bytes × number_of_trees × 2). That multiplication by 2 accounts for internal data structures during training.

Topic		Replies	Views
AI-powered join recommendations missing in data module during large dataset preparation IBM Cognos Analytics question , performance , cogn-11-2-4 , ai-ml-insights , data-preparation , data-module , server-resources , dataset-limits , ai-features	4	0	March 15, 2025
ML model training fails in firmware management module on ThingWorx 9.5 with insufficient memory error PTC ThingWorx question , python , memory-optimization , analytics-ml , firmware-mgmt , data-sampling , twx-95 , thingworx-analytics , ml-training-failure	3	0	August 1, 2025
Planning operator fails with memory allocation error during large dataset processing SAP IBP question , performance , si-2308 , memory-error , data-volume , logistics-mgmt , configuration-mgt , supply-optimizer , operator-config	3	1	July 1, 2025
Code Engine batch job for ML inference hangs on large input files in ic-2019 compute module IBM Cloud question , ml-ai , compute , streaming , ic-2019 , python , resource-limits , batch-job , inference	4	0	November 8, 2025
Db2 vs Cloudant for AI-driven transactional insights: strengths and limitations IBM Cloud discussion , analytics , database , ml-integration , ic-2020 , machine-learning , architecture-design , nosql , relational	5	0	October 14, 2025
ML-based usage forecasting job fails in billing engine when processing large device fleet IBM Watson IoT question , batch-processing , memory-allocation , billing-engi , analytics-ml , wiot-ea , ml-forecast-fai , watson-iot-billing , usage-billing	4	0	July 11, 2025
BigQuery Dataflow pipeline fails on large datasets with memory quota exceeded errors Google Cloud Platform (GCP) question , data-warehousing , analytics , dataflow , java , etl-pipeline , gcp-2019 , pipeline-optimization , bigquery	3	0	December 2, 2025
Cloud Object Storage AI data lake performance degrades with concurrent ML training jobs IBM Cloud question , storage , data-lake , ic-2019 , machine-learning , performance-degradation , concurrent-access , cloud-object-storage , ml-workflows	4	0	October 31, 2025
Dataflow pipeline fails when ERP container hits OOM, causing incomplete loads Google Cloud Platform (GCP) question , analytics , dataflow , pipeline , java , gcp-2020 , memory-optimization , container-servi , oom-failure	5	0	June 28, 2025

Db2 AI model training fails with 'insufficient memory' error during large dataset ingestion in ic-2019 database module

Related topics