We’re experiencing severe delays in firmware management event processing in iod-23 that’s causing cascading firmware update failures across our device fleet. Events related to firmware updates (download confirmations, installation status, rollback triggers) are being delayed by 15-30 minutes in the processing queue, which causes the firmware management orchestrator to timeout and mark updates as failed.
We have approximately 5,000 devices undergoing staged firmware rollouts, and the event processing delays are preventing proper coordination. Update success rate has dropped from 98% to 45% due to these timing issues. The firmware-event queue shows consistent backlog growth, indicating the processing throughput isn’t keeping up with event generation rate.
Queue optimization seems to be the key issue - the event processing system can’t handle the volume of firmware update events during large-scale rollouts. Anyone dealt with firmware event queue tuning in iod-23?
Thanks for the suggestions. I increased concurrent processors to 50 and enabled event priority tagging for firmware events. Queue backlog is starting to clear, but still seeing 5-10 minute delays during peak update windows.
We had this exact problem during a fleet-wide firmware update last quarter. The issue isn’t just the number of concurrent processors - it’s also the event priority system. Firmware update events should be marked as high-priority to prevent them from being queued behind lower-priority telemetry events. Configure your firmware management module to tag all firmware-related events with priority level 1. This ensures they’re processed ahead of routine telemetry, reducing the queue delay significantly.
Also check your queue partition strategy. If all firmware events are going into a single queue partition, you’re creating a bottleneck. Partition the queue by device group or region so events can be processed in parallel across multiple queue instances.
One more thing - verify that your firmware orchestrator timeout settings account for the expected queue processing time. If the orchestrator times out before events are processed, increasing queue throughput won’t help. Extend the timeout to at least 10 minutes to accommodate queue processing delays during high-volume periods.