Bulk material import jobs stuck in processing state after cloud migration

After migrating our material-mgmt module to cloud infrastructure, bulk material import jobs are getting stuck in ‘Processing’ state and never complete. We’re running dam-2022 with Cloud SQL as our database backend.

The jobs process about 15-20% of records then hang indefinitely. Database logs show:


Deadlock found when trying to get lock; try restarting transaction
Table: MATERIAL_MASTER, Operation: INSERT
Wait time exceeded: 300 seconds

Our imports typically handle 5000-8000 material records per batch. This worked fine in our on-premise setup with the same batch sizes. The material import is completely blocked now - we can’t update BOMs, inventory levels, or material specifications. Is this a known issue with Cloud SQL and batch operations?

I’ve dealt with similar batch import deadlocks in cloud deployments. The issue is usually a combination of factors: inappropriate batch sizes for cloud I/O patterns, missing database indexing optimized for cloud storage, and improper transaction isolation levels. You need to tune all three aspects. Also, Cloud SQL has connection pooling limits that can cause queuing during large batch operations. Check your connection pool configuration and max_connections setting.

Here’s a comprehensive solution for bulk material import deadlocks in cloud deployments:

1. Cloud SQL Deadlock Analysis: The deadlocks occur because Cloud SQL handles concurrent writes differently than on-premise databases. In cloud storage, multiple batch operations competing for the same table locks create contention that escalates to deadlocks.

Root causes identified:

  • SERIALIZABLE isolation level is too strict for cloud batch operations
  • Missing indexes on foreign key columns force full table scans during constraint validation
  • Batch size of 1000 records exceeds optimal cloud I/O chunk size
  • Insufficient connection pool tuning for concurrent batch operations

2. Batch Import Queue Tuning:

Reduce batch commit size for cloud:


# Optimal for Cloud SQL
batch.commit.size=250
batch.queue.max=20
batch.retry.attempts=3

Smaller batches (250 records) reduce lock duration and allow better interleaving of concurrent operations. This prevents lock escalation that causes deadlocks.

Implement batch processing strategy:

  • Process materials in parallel streams by material type or category
  • Use asynchronous batch processing with queue-based coordination
  • Implement exponential backoff for retry logic (1s, 2s, 4s intervals)
  • Add circuit breaker pattern to pause imports during high contention periods

3. DB Indexing for Cloud:

Create missing indexes critical for cloud performance:

CREATE INDEX idx_material_type ON MATERIAL_MASTER(MATERIAL_TYPE_ID);
CREATE INDEX idx_storage_loc ON MATERIAL_MASTER(STORAGE_LOCATION_ID);
CREATE INDEX idx_material_status ON MATERIAL_MASTER(STATUS, LAST_MODIFIED);
CREATE INDEX idx_batch_import ON MATERIAL_MASTER(IMPORT_BATCH_ID, IMPORT_STATUS);

These indexes are essential for:

  • Fast foreign key constraint validation during bulk inserts
  • Efficient lookups during duplicate checking
  • Quick status updates for batch tracking
  • Reduced table scan overhead in cloud storage

Update table statistics for cloud query optimizer:

ANALYZE TABLE MATERIAL_MASTER;
ANALYZE TABLE MATERIAL_BOM;
ANALYZE TABLE MATERIAL_INVENTORY;

Cloud SQL query optimizers rely heavily on statistics for execution plans. Outdated statistics cause poor plan choices that increase lock contention.

4. Transaction Isolation Adjustment: Change from SERIALIZABLE to READ COMMITTED:

  • Reduces lock scope and duration
  • Allows better concurrency for batch operations
  • Still maintains data consistency for material imports
  • Prevents phantom reads while avoiding excessive locking

Update your import service configuration to use READ COMMITTED with row versioning enabled.

5. Connection Pool Optimization: Your current 50-connection pool is undersized for cloud batch operations:


db.pool.max.connections=150
db.pool.min.connections=25
db.pool.connection.timeout=45000
db.pool.idle.timeout=300000

Cloud SQL can handle more connections than on-premise due to better resource scaling. Increase the pool to support concurrent batch streams.

6. Batch Import Architecture Changes:

Implement staged import process:

  • Stage 1: Validate and load to temporary staging table (no constraints)
  • Stage 2: Process in smaller batches from staging to master table
  • Stage 3: Update related tables (BOM, inventory) after main import completes

This approach reduces lock contention by separating validation from insertion.

Add batch monitoring:

  • Track batch progress with granular status updates
  • Implement timeout detection (kill jobs stuck >10 minutes)
  • Add metrics for deadlock detection and automatic retry
  • Log slow queries (>5 seconds) for further optimization

7. Cloud-Specific Optimizations:

Enable Cloud SQL performance insights:

  • Monitor slow queries and lock wait times
  • Identify peak contention periods
  • Adjust batch scheduling to avoid concurrent large imports

Configure Cloud SQL flags:

  • innodb_lock_wait_timeout: Reduce from 300s to 50s
  • innodb_deadlock_detect: Enable for automatic deadlock resolution
  • max_allowed_packet: Increase to 64MB for larger batch inserts

8. Testing and Validation: After implementing these changes:

  • Test with progressively larger batches (250, 500, 1000 records)
  • Run concurrent import jobs to verify deadlock resolution
  • Monitor database performance metrics for 48 hours
  • Validate data consistency with sample audits

Expected Results:

  • Import completion time: 40-50% faster than current stuck state
  • Deadlock occurrence: Reduced to <1% of batch operations
  • Concurrent batch capacity: Support 4-6 simultaneous imports
  • Database lock wait time: Average <2 seconds per batch

The combination of smaller batch sizes, proper indexing, and relaxed isolation levels should completely resolve your material import blocking issues in the cloud environment.

Tom, can you share your current indexing strategy for the MATERIAL_MASTER table? Also, what isolation level are you using for the import transactions? In cloud environments, READ COMMITTED with row versioning often works better than SERIALIZABLE for bulk operations.

Our batch commit size is set to 1000 records, same as it was on-premise. Should we reduce this for cloud? Also, the processing seems to slow down significantly after the first 1000 records are imported.

We’re using SERIALIZABLE isolation level (carried over from on-premise config). Indexes exist on primary key and MATERIAL_NUMBER, but I don’t think we have indexes on the foreign key columns. The connection pool is set to 50 connections max. Would increasing that help?

Cloud SQL has different locking behavior than on-premise databases. The deadlock suggests your batch import is trying to lock too many rows simultaneously. What’s your batch commit size? In cloud environments, smaller batch sizes with more frequent commits usually work better.

The slowdown after the first batch is a classic sign of missing indexes or query plan changes in cloud. Cloud SQL query optimizer behaves differently than on-premise SQL Server or Oracle. Run EXPLAIN on your material import queries to see if indexes are being used. Also check if you have proper indexes on foreign key columns like MATERIAL_TYPE_ID and STORAGE_LOCATION_ID.