Integration module firmware update fails due to MQTT broker disconnects during device push (cciot-25)

nehaops · January 5, 2025, 1:30pm

We’re encountering consistent firmware update failures across our industrial gateway fleet when pushing updates through the integration module. The updates trigger successfully but fail mid-transfer with MQTT broker disconnects.

The pattern we’re seeing:


MQTT session lost during firmware transfer
Connection timeout after 45 seconds
Device reports: CONNACK not received

This happens on about 60% of our devices, especially those in remote locations. The MQTT session persistence seems unreliable during large file transfers, and we’re hitting what looks like broker resource limits. Devices attempt reconnection but the update job times out before they can resume.

Has anyone dealt with MQTT stability issues during firmware updates? We need the edge device reconnection logic to be more resilient. Our current setup uses default MQTT keep-alive settings and QoS 1 for firmware delivery.

patel_ace · January 30, 2025, 7:25am

For firmware updates via MQTT, you need to treat session persistence differently than regular telemetry. The broker needs to maintain state during long transfers. We implemented chunked transfers with explicit acknowledgment per chunk, and increased our keep-alive interval to 300 seconds during firmware operations. Also critical: enable clean_session=false on your device clients so they can resume after reconnection. The devices should track which chunks they’ve received and request only missing pieces on reconnect.

ninjapro · January 10, 2025, 9:48pm

We had this exact issue last year. The problem was twofold: broker resource limits were too conservative for concurrent firmware pushes, and our edge devices weren’t configured to resume interrupted transfers. Check your broker’s max_connections and max_inflight_messages parameters. We had to increase both significantly for bulk firmware operations.

ninjapro · February 2, 2025, 11:23am

To answer your question Mike - we went with max_inflight_messages=100 and max_connections=500 for our production broker cluster. But the real fix was implementing persistent sessions with proper QoS 2 for firmware chunks. This guarantees exactly-once delivery even across disconnects.

hans_planner · January 6, 2025, 10:44pm

I’ve seen similar behavior. First thing to check is your MQTT broker’s connection timeout settings versus the actual firmware transfer time. If your updates take longer than the broker’s idle timeout, you’ll get disconnected mid-transfer. Also verify your QoS settings - QoS 1 should work but you might want to look at message size limits.

kumar_erp · January 21, 2025, 11:55pm

Thanks for the suggestions. I checked our broker config and we’re running with default max_inflight_messages=20 which seems low. What values did you end up using for large firmware transfers? Also, how did you handle the reconnection logic on the device side?

giorgio_985 · February 3, 2025, 12:06pm

Let me provide a comprehensive solution that addresses all three critical areas:

MQTT Session Persistence: Configure persistent sessions on both broker and clients. On your MQTT broker (assuming Mosquitto or similar):


persistence true
persistence_location /var/lib/mqtt/
autosave_interval 300

On device clients, set clean_session=false and use a unique client_id per device. This ensures the broker maintains message queues during disconnects.

Broker Resource Limits: Your current limits are definitely too restrictive for firmware operations. Update your broker configuration:


max_inflight_messages 100
max_queued_messages 1000
max_connections 500
message_size_limit 268435456

The message_size_limit is critical - set it to at least 256MB to handle chunked firmware transfers. Also implement connection pooling if you’re updating more than 50 devices simultaneously.

Edge Device Reconnection Logic: This is where most implementations fail. Your devices need intelligent retry logic:

Implement exponential backoff for reconnection attempts (start at 5s, max at 120s)
Track firmware transfer state locally - store received chunk IDs in persistent storage
On reconnection, query the broker for missing chunks rather than restarting the entire transfer
Use MQTT topic structure like: fw/update/{device_id}/chunk/{chunk_id}
Implement a resume capability:


// Pseudocode - Resume logic:
1. On reconnect, read local state: lastChunkReceived
2. Subscribe to topic: fw/update/{deviceId}/chunk/*
3. Publish resume request: fw/status/{deviceId}/resume
4. Include payload: {"lastChunk": lastChunkReceived}
5. Server sends only missing chunks
6. Device validates checksum after complete transfer

Additional Recommendations:

Increase MQTT keep-alive to 300 seconds during firmware operations
Use QoS 2 for firmware chunks to guarantee exactly-once delivery
Implement health checks: devices should publish heartbeat every 60s during updates
Set firmware job timeout to at least 30 minutes for remote devices
Monitor broker metrics: connection count, message queue depth, memory usage

After implementing these changes, our firmware update success rate went from 40% to 98%, with the remaining 2% being actual network outages. The key is treating firmware updates as a special operation with different QoS and persistence requirements than regular telemetry.

Topic		Replies	Views
Firmware management fails with MQTT connection drops during bulk updates Cumulocity IoT question , connectivity , device-management , mqtt-protocol , mqtt , firmware-mgmt , c8y-1020 , connection-drop , failed-updates	6	1	June 20, 2025
Firmware update fails over-the-air when using MQTT with ThingWorx Edge - device disconnects mid-transfer PTC ThingWorx question , mqtt , edge-device , firmware-mgm , ota-updates , hw-integration , mqtt-disconnect , qos-config , twx-97	7	0	October 25, 2025
Gateway firmware update fails with MQTT connection lost error during remote deployment IBM Watson IoT question , json , connection-timeout , edge-gateway , firmware-update , mqtt , gateway-mgmt , broker-config , wiot-25	3	0	September 19, 2025
Firmware update events not reaching devices during network interruptions Oracle IoT Cloud question , network-resilience , event-delivery , event-processing , mqtt , firmware-mgmt , oiot-22 , device-virtualization , retry-mechanisms	5	0	March 30, 2025
OTA firmware update fails for asset tracking devices using MQTT Pub/Sub-update job stuck in pending state Google Cloud IoT question , pubsub , asset-tracking , device-management , firmware-update , mqtt , ota-updates , gcpiot-25	7	0	September 22, 2025
Device shadow synchronization fails after firmware update with MQTT disconnect errors Google Cloud IoT question , firmware-update , mqtt-protocol , mqtt , shadow-sync , device-shadow , iiot-support , gcpiot-25 , cloud-iot-core	6	0	December 7, 2025
Devices lose connectivity during firmware updates in IoT Operations Dashboard Cisco IoT Cloud Connect question , connectivity , ota-updates , firmware-mgmt , iod-23 , device-availability , heartbeat-monitoring , checksum-validation , rollback-mechanism	6	0	January 24, 2025
Firmware update fails on asset tracking devices due to MQTT payload size limits SAP IoT question , json , payload-limit , firmware-update , mqtt , asset-tracki , ota-update , device-mgmt , sapiot-25	4	0	January 14, 2025
MQTT connection drops for devices streaming high-frequency data to data-stream module Oracle IoT Cloud question , data-gap , qos , device-connectivity , mqtt , mqtt-broker , data-stream , device-mgmt , oiot-23	5	0	August 23, 2025

Integration module firmware update fails due to MQTT broker disconnects during device push (cciot-25)

Related topics