We’re debating two approaches for firmware update orchestration: API-based application enablement where updates are triggered through our management API, versus direct device control where devices check for updates and pull them autonomously.
The app enablement approach gives us centralized control and auditability - we can track who initiated updates, enforce approval workflows, and rollback if needed. However, it requires devices to maintain persistent connections to receive update commands, which impacts battery life for edge devices.
Direct device control is more resilient (devices poll for updates on their schedule) and reduces server-side complexity. But we lose visibility into update status and can’t enforce mandatory update windows. Security-wise, direct control requires devices to authenticate and authorize themselves to the update server, adding complexity to device firmware.
What have others implemented for large-scale firmware management? Particularly interested in security implications and operational overhead of each approach.
The hybrid model is interesting. How do you handle auditability with the polling approach? When a device autonomously downloads and installs an update, how do you track that in your change management system? Also, what happens if a device polls during a blackout window when updates shouldn’t be applied?
Auditability is critical for compliance. We require all firmware updates to log to a central audit service regardless of initiation method. Each device reports: current firmware version, available update version, download timestamp, installation timestamp, and result status. This creates an immutable audit trail. For blackout windows, devices query a ‘maintenance window’ API before applying updates. If outside the allowed window, they defer installation until the next window opens.
We use a hybrid approach - devices poll for updates every 24 hours (direct control model), but critical security patches can be pushed immediately via IoT Hub direct methods (app enablement). This balances operational flexibility with emergency response capability. The polling interval is configurable per device type, so battery-powered sensors check less frequently than AC-powered gateways.