We’re encountering persistent timeout errors when attempting batch firmware updates across our device fleet. The updates fail consistently when targeting more than 150 devices simultaneously, even though our batch size is configured at 200.
Error details show timeout after 300 seconds with only 40-60% of devices receiving the update package:
Error: FirmwareUpdateTimeout
batch.size: 200
timeout.ms: 300000
devices.updated: 89/200
retry.attempts: 3
The incomplete deployments are creating version fragmentation across our fleet, making it difficult to maintain consistent firmware versions. We’ve tried increasing timeout values to 600 seconds, but that just delays the failure. The retry strategy seems to compound the problem by attempting to update already-updated devices.
Has anyone successfully implemented large-scale firmware batch updates? What’s the optimal batch size and timeout configuration for reliable over-the-air updates?
200 devices per batch is too aggressive for OTA updates. Network variability and device availability make large batches unreliable. We use batch sizes of 50 with progressive rollout - if 90% success rate is achieved, proceed to next batch.
Check your firmware package size and transfer protocol. Large packages over unreliable networks will timeout regardless of batch configuration. We implemented delta updates (only changed components) which reduced package size by 70% and dramatically improved success rates.
Your retry strategy is definitely problematic. Retrying the entire batch when only 40% failed wastes bandwidth and processing time. Implement selective retry that tracks individual device status and only retries failed devices. We also added exponential backoff between retries - first retry after 60 seconds, second after 180 seconds, third after 540 seconds. This prevents overwhelming devices or network infrastructure during retry storms. Additionally, consider device readiness checks before initiating updates - verify battery level, network quality, and available storage before including a device in the update batch.