Subscription renewal job fails due to database locks in subscription-mgmt module during batch processing

Our scheduled subscription renewal jobs are intermittently failing with database deadlock errors during peak processing periods. The job processes about 5000 subscription renewals nightly, but we’re seeing failures on roughly 20-30% of runs:


ORA-00060: deadlock detected while waiting for resource
Session 1: UPDATE OSS_SUBSCRIPTIONS SET STATUS='RENEWED'
Session 2: UPDATE OSS_BILLING_SCHEDULES SET NEXT_BILL_DATE

We’ve tried adjusting the batch job scheduling to run during off-peak hours, but the deadlocks persist. Database deadlock analysis shows conflicts between the renewal process and billing schedule updates. Not sure if this is a transaction isolation tuning issue or if we need to restructure how the renewal logic processes records.

Your deadlock situation requires a multi-faceted solution addressing all three focus areas:

Batch Job Scheduling: First, verify what other processes are running concurrently with your renewal job. Use this query to identify overlapping scheduled processes:

SELECT request_id, program_name,
       actual_start_date, actual_completion_date
FROM fnd_concurrent_requests
WHERE actual_start_date BETWEEN :renewal_start AND :renewal_end
  AND status_code = 'R';

You mentioned running during off-peak hours, but ‘off-peak’ might still have billing, invoicing, or revenue recognition jobs running. Create an exclusive time window where only the renewal job executes. Use ESS job dependencies to enforce this - configure your renewal job as a prerequisite for other subscription-related processes.

Implement job chaining properly: Renewal → Billing Schedule Update → Invoice Generation. This serializes the operations and eliminates concurrent access to the same data. Configure the schedule with sufficient buffer time between job submissions (15-30 minutes depending on your data volume).

Database Deadlock Analysis: Your error shows the classic two-resource deadlock pattern. The renewal process updates OSS_SUBSCRIPTIONS first, then OSS_BILLING_SCHEDULES. Meanwhile, another process (likely the billing schedule generator) does the reverse. To diagnose comprehensively:

SELECT blocking_session, sql_id, event
FROM v$session
WHERE blocking_session IS NOT NULL
  AND program LIKE '%Subscription%';

Enable deadlock trace collection by setting EVENT=“60 trace name DEADLOCK level 10” in your database. This generates detailed deadlock graphs showing exactly which resources are involved and the lock wait chain.

The solution is enforcing consistent lock ordering. Modify your renewal process to always acquire locks in this order:

  1. OSS_SUBSCRIPTIONS (parent)
  2. OSS_BILLING_SCHEDULES (child)
  3. OSS_INVOICE_LINES (if applicable)

Use ORDER BY SUBSCRIPTION_ID in your processing cursor to ensure deterministic ordering:

FOR rec IN (
  SELECT * FROM OSS_SUBSCRIPTIONS
  WHERE renewal_date = TRUNC(SYSDATE)
  ORDER BY subscription_id FOR UPDATE SKIP LOCKED
)

The SKIP LOCKED hint prevents waiting for locked rows - the job will process available records and retry locked ones in the next iteration.

Transaction Isolation Tuning: Your current isolation level is likely READ COMMITTED (Oracle default), which is appropriate for most OLTP operations. However, your long-running batch transaction is holding locks too long. Implement these changes:

  1. Reduce transaction scope: Instead of processing all 5000 renewals in one transaction, commit every 500-1000 records. This releases locks incrementally:
DECLARE
  v_counter NUMBER := 0;
BEGIN
  FOR rec IN renewal_cursor LOOP
    -- Process renewal
    v_counter := v_counter + 1;
    IF MOD(v_counter, 500) = 0 THEN
      COMMIT;
    END IF;
  END LOOP;
  COMMIT;
END;
  1. Implement retry logic: When deadlocks occur, catch the ORA-00060 exception and retry after a random delay:
EXCEPTION
  WHEN DEADLOCK_DETECTED THEN
    DBMS_LOCK.SLEEP(DBMS_RANDOM.VALUE(1,5));
    RETRY_RECORD(rec.subscription_id);
  1. Use autonomous transactions for logging: If you’re logging progress within the renewal transaction, move logging to autonomous transactions to avoid holding locks during I/O operations.

  2. Consider READ COMMITTED with row-level locking: Instead of locking entire result sets, use SELECT FOR UPDATE on individual records as you process them. This minimizes lock duration.

  3. Analyze wait events: Check v$session_wait during job execution to see if you’re hitting buffer busy waits or latch contention in addition to deadlocks.

Additional recommendations:

  • Partition your OSS_SUBSCRIPTIONS table by renewal_date if you haven’t already. This reduces contention by isolating different date ranges.
  • Implement a queue-based architecture where renewal candidates are inserted into a work queue table, and multiple worker processes consume from it using SKIP LOCKED. This naturally handles concurrency.
  • Review your foreign key constraints - if they’re not indexed, updates to parent tables can cause table-level locks on child tables.
  • Monitor AWR reports for your renewal job window to identify specific wait events and contention points.

Test these changes in your development environment with realistic concurrency - simulate multiple sessions executing renewal-related operations simultaneously to verify deadlock elimination.

Check your transaction isolation level settings. If you’re running at SERIALIZABLE isolation, you’re more prone to deadlocks. READ COMMITTED might be sufficient for your use case and would reduce lock contention significantly.

That makes sense. How do we enforce consistent lock ordering? Is this something we configure in the scheduled job parameters or does it require custom code modifications?

Classic deadlock scenario - two processes trying to update related records in different orders. Your renewal job is probably processing subscriptions in one order while the billing schedule job processes them differently. This creates circular wait conditions. You need to ensure both processes lock records in the same sequence.

I’d also look at whether you have triggers or foreign key constraints that might be acquiring additional locks you’re not aware of. Sometimes cascading updates can create unexpected lock chains that lead to deadlocks.