Intercompany billing reconciliation batch job fails intermittently with lock timeout errors in cloud environment

Our intercompany billing reconciliation batch job (FICO_INTERCO_RECON) fails intermittently during month-end close with lock timeout errors. The job processes cross-company transactions between our 12 subsidiary entities, and it’s been running for 6+ hours before failing around 80% completion.

The error log shows:


Error: Lock timeout exceeded (300s)
Table: BSEG (Accounting Document Segment)
Operation: UPDATE WHERE BUKRS IN ('1000','2000'...)

I suspect the batch job design isn’t optimized for our data volume - we process about 50K intercompany transactions per month. Table partitioning on BSEG might help, but I’m not sure if that’s supported in cloud deployments. We also need better lock timeout tuning and possibly auto-restart configuration to handle transient failures without manual intervention during close windows.

For the auto-restart configuration, implement a custom job monitoring solution using SAP Job Scheduling Service. Create a monitoring job that checks FICO_INTERCO_RECON status every 15 minutes. If it detects a lock timeout failure, automatically restart from the last successful checkpoint. You’ll need to modify the batch job to write checkpoint records to a custom Z-table so restarts can skip already-processed transactions. We’ve had this running successfully for 8 months now.

Let me provide a comprehensive solution covering all the optimization dimensions you need.

Batch Job Optimization: First, modify your job variant to enable packet processing with optimal batch sizes:


// Pseudocode - Key implementation steps:
1. Set packet size parameter: PACKET_SIZE = 5000 transactions
2. Enable commit work after each packet
3. Configure parallel processing: MAX_PARALLEL_JOBS = 4
4. Set company code ranges: Job1=[1000-1003], Job2=[2000-2003], etc.
5. Implement checkpoint logging to Z_RECON_CHECKPOINT table
// See documentation: SAP Batch Job Optimization Guide

This breaks your 50K transactions into 10 packets of 5K each, with 4 parallel jobs processing different company code ranges simultaneously. Each packet commits independently, releasing locks every 5-10 minutes instead of holding them for hours.

Table Partitioning Strategy: Implement a hybrid partitioning scheme on BSEG combining fiscal year and company code:


ALTER TABLE BSEG PARTITION BY RANGE (GJAHR)
  SUBPARTITION BY LIST (BUKRS)
  (PARTITION p2023 VALUES LESS THAN ('2024')
    (SUBPARTITION p2023_1000 VALUES ('1000'),
     SUBPARTITION p2023_2000 VALUES ('2000')),
   PARTITION p2024 VALUES LESS THAN ('2025'))

This creates isolated data segments that your parallel jobs can access without cross-partition locking. Your UPDATE operations will only lock specific subpartitions (e.g., 2024/company 1000) rather than the entire BSEG table.

Lock Timeout Tuning: Adjust database lock parameters in HANA configuration:

  1. Increase statement timeout: SET PARAMETER statement_timeout = ‘600000’ (10 minutes)
  2. Enable lock wait monitoring: SET PARAMETER lock_wait_timeout = ‘180000’ (3 minutes)
  3. Configure deadlock detection: SET PARAMETER deadlock_detection_interval = ‘1000’ (1 second)

These settings give individual statements more breathing room while detecting actual deadlocks quickly. However, with proper packet sizing, you shouldn’t hit these limits.

Auto-Restart Configuration: Implement a resilient job framework using SAP Job Scheduling Service:

Create a monitoring job (Z_MONITOR_RECON) that runs every 10 minutes:


SELECT jobname, status, last_checkpoint
FROM TBTCO JOIN Z_RECON_CHECKPOINT
WHERE jobname = 'FICO_INTERCO_RECON'

If status = ‘FAILED’ and error = ‘LOCK_TIMEOUT’:

  1. Read last successful checkpoint from Z_RECON_CHECKPOINT
  2. Submit new job variant with parameter START_FROM_PACKET = last_checkpoint + 1
  3. Log restart event to monitoring table
  4. Send notification to finance team

Modify FICO_INTERCO_RECON to write checkpoints:


LOOP AT lt_packets INTO ls_packet.
  PERFORM process_packet USING ls_packet.
  COMMIT WORK.

  INSERT INTO z_recon_checkpoint VALUES
    (sy-datum, sy-uzeit, ls_packet-number).
ENDLOOP.

Additional Optimizations:

  1. Index Strategy: Ensure composite indexes exist on BSEG for (BUKRS, GJAHR, BELNR) and (BUKRS, AUGBL) to support reconciliation queries
  2. Parallel Processing Safety: Configure company code ranges so each parallel job works on non-overlapping master data - eliminates SKA1/T001 contention
  3. Memory Management: Set job memory limit to 4GB per parallel process to prevent swapping
  4. Monitoring Dashboard: Create a Fiori app displaying real-time progress from Z_RECON_CHECKPOINT table

With these changes, your 50K transactions should process in under 90 minutes with automatic recovery from transient failures. The combination of partitioning, parallel processing, and checkpoint-based restarts provides both performance and resilience for month-end close operations.

Six hours for 50K transactions is way too slow - you’re looking at about 2.3 transactions per second. Something’s fundamentally wrong with the job logic. Profile the job using transaction ST12 (ABAP Trace) to identify the bottleneck. I’d bet you’re doing row-by-row processing with expensive SELECT SINGLE statements inside loops. The job should use bulk operations with internal tables and MODIFY statements. Also check if parallel processing is enabled - you should be able to run multiple work processes handling different company code ranges simultaneously.

BSEG table partitioning is absolutely supported in S/4HANA Cloud - it’s actually recommended for high-volume financial deployments. Partition by fiscal year and company code using range partitioning. This isolates your UPDATE operations to specific partitions rather than locking the entire table. Run transaction DB02 to analyze current table sizes and fragmentation. If BSEG exceeds 10M rows, partitioning becomes critical. Also check if you have proper indexes on BUKRS and GJAHR - missing indexes force full table scans that hold locks longer.

Good point about parallel processing. The job variant is currently set to sequential mode. If I enable parallel processing across company codes, won’t that increase lock contention on shared master data tables like SKA1 and T001? Also, what’s the recommended partition strategy for BSEG - should I partition by fiscal year only or combine year + company code?