Firmware update job for billing devices stuck in pending state despite device connectivity

I’ve scheduled a firmware update job for our billing-engine device group (47 smart meters) through the IoT Cloud Console, but the job has been stuck in “Pending” state for over 18 hours. All devices show as connected and healthy in the device registry, with active telemetry streams coming in normally.

The job state monitoring shows no progress - 0 devices updated, 0 failed, 47 pending. I’ve verified the update policy validation passed during job creation, and the firmware package was successfully uploaded to the platform. Device group status indicates all devices are online and capable of receiving updates.

I’m concerned about delaying these security patches further. The console doesn’t provide detailed error logs for why the job won’t start. Has anyone encountered firmware update jobs that just won’t transition from pending to active? What could block job execution when devices are clearly reachable?

Good point about agent versions. I checked and we have a mix - most are on 2.1.6 but about 8 devices are still on 2.1.3. Could that mixed version scenario cause the entire job to hang? Also, our update policy doesn’t have specific time windows configured, just a general “allow updates” flag set to true.

I’ve seen this when the update policy has scheduling constraints that aren’t met. For example, if you set a maintenance window that doesn’t align with device availability, or if there are conflicting policies. Go to Device Groups → billing-engine → Policies and check if there are any active constraints preventing immediate updates.

Check the device agent versions on your meters. In oiot-22, there was a compatibility issue where devices running agent versions below 2.1.5 couldn’t process update jobs created with the newer job scheduler. The job would stay pending because devices weren’t acknowledging the update request.

Also verify the job priority and queue status. If there are other jobs running on the same device group or overlapping devices, the new job will wait. Use the job management API to query active jobs: GET /iot/api/v2/jobs?deviceGroup=billing-engine&status=active. Sometimes jobs get queued behind failed jobs that need manual cleanup.

I’ve troubleshot this exact scenario multiple times in oiot-22. Here’s the systematic approach:

Job State Monitoring Deep Dive: First, access the job details API to get more information than the console shows:

curl -X GET https://iot-instance.oraclecloud.com/iot/api/v2/jobs/{job-id}/details \
  -H "Authorization: Bearer {token}"

Look for the “executionBlockers” field in the response - this contains the actual reasons why job execution is blocked.

Common Causes for Pending State:

  1. Device Group Status Issues: Even though devices show connected, check their firmware update capability status:
  • Navigate to Device Registry → billing-engine group
  • Filter by “Update Capability” - devices must show “Ready” not just “Connected”
  • Some devices may be in “Busy” state if they’re processing other operations
  • Check for devices with “Update Disabled” flag set in their metadata
  1. Update Policy Validation - Hidden Constraints: Your policy might have constraints you’re not seeing in the UI:
  • Minimum battery level requirements (critical for battery-powered meters)
  • Network bandwidth thresholds
  • Device uptime requirements (some policies require device up for X hours)
  • Concurrent update limits (max devices updating simultaneously)

Verify with: IoT Console → Policies → billing-engine-update-policy → Advanced Settings

  1. Firmware Package Compatibility: Validate the device model mapping:
# Check firmware package metadata
iotctl firmware describe --package billing-meter-fw-v3.2.1
# Compare supportedModels array with actual device models in group
iotctl devices list --group billing-engine --fields model
  1. Job Queue Conflicts: Query for blocking jobs:
iotctl jobs list --device-group billing-engine --status active,paused,failed

If any jobs are in “failed” state, they may be blocking the queue. Cancel or resolve them first.

Resolution Steps:

  1. Cancel the stuck job: `iotctl jobs cancel --job-id {your-job-id}

  2. Update devices with older agents first (those 8 on v2.1.3):

    • Create separate job for agent upgrade
    • Wait for completion before firmware update
  3. Verify device update capability:

    • Run device health check: `iotctl devices health-check --group billing-engine
    • Address any devices showing “Update Disabled” or “Busy” status
  4. Recreate the firmware update job with explicit policy:

    • Set concurrent update limit to 10 devices
    • Configure retry policy: 3 attempts with 30-minute intervals
    • Enable detailed logging: `–log-level DEBUG
  5. Monitor job progression:

    • Use webhook notifications for state changes
    • Check device-level logs: `/var/log/iot-agent/firmware-update.log Preventive Measures:
  • Standardize agent versions across device fleet before major updates
  • Implement pre-update validation scripts
  • Set up job state monitoring alerts
  • Document device group prerequisites for firmware updates

In 90% of cases I’ve seen, the issue is either mismatched device model IDs in firmware metadata or devices not actually being in “Ready” state despite showing connected. The detailed job API response will tell you exactly what’s blocking execution.

Mixed agent versions shouldn’t block the entire job, but those older agents won’t process the update. More likely issue: check if your firmware package metadata matches the device model IDs exactly. I’ve debugged cases where a typo in the device model field caused jobs to stay pending because the platform couldn’t match the firmware to any devices in the group. The validation passes at upload time but fails during job execution matching.