After rebooting IoT devices in the field (running our custom device agent on cciot-24), the device shadow state in the Device Shadow service shows stale pre-reboot values. The devices are successfully reconnecting to the platform and sending telemetry data, but their reported shadow state doesn’t reflect the actual device configuration. This is causing configuration drift issues where the platform thinks devices are in one state while they’re actually in another. We’re particularly concerned about device agent configuration settings and whether shadow sync on reboot is properly configured. We’ve noticed the device agents don’t seem to be publishing their current state to the shadow topics after reconnection. Could this be related to MQTT retained messages or the shadow sync mechanism? We need devices to immediately report their current state after reboot to maintain accurate shadow representations.
Consider implementing a shadow reconciliation routine that runs periodically (every 5-10 minutes) to compare local device state with the shadow state and publish updates if they differ. This provides ongoing drift detection beyond just the reboot scenario and helps catch any missed updates during network interruptions.
Here’s a comprehensive solution addressing all three focus areas:
Device Agent Configuration: Your device agent needs proper initialization logic to handle shadow state synchronization after reboot. Update your agent’s startup sequence to include these steps:
- Establish MQTT connection with clean session = false to preserve subscriptions
- Subscribe to shadow delta and get response topics
- Request current shadow state
- Publish current device state
- Begin normal operation
Key configuration parameters for your device agent:
mqtt.clean.session=false
mqtt.qos.level=1
shadow.sync.on.connect=true
shadow.sync.interval=300
shadow.publish.on.change=true
The shadow.sync.interval ensures periodic reconciliation every 5 minutes to catch any drift.
Shadow Sync on Reboot: Implement this initialization sequence in your device agent code:
Pseudocode - Shadow sync initialization:
1. Connect to MQTT broker with clientId={deviceId}
2. Subscribe to: $aws/things/{deviceId}/shadow/get/accepted
3. Subscribe to: $aws/things/{deviceId}/shadow/delta
4. Publish empty message to: $aws/things/{deviceId}/shadow/get
5. Wait for shadow/get/accepted response (timeout: 5 seconds)
6. Compare received shadow with current device state
7. Publish full reported state to: $aws/things/{deviceId}/shadow/update
8. Start normal telemetry and shadow monitoring
The shadow get request retrieves the platform’s view of device state, allowing your agent to detect any differences and publish the authoritative current state.
MQTT Retained Messages: The Device Shadow service publishes shadow documents as retained messages on certain topics. However, you cannot rely solely on these for synchronization because:
- Retained messages may be stale if the shadow was updated while device was offline
- Network issues can cause missed retained message delivery
- MQTT brokers may clear retained messages on restart
Instead, use the request/response pattern with shadow/get and shadow/update topics. This provides reliable synchronization regardless of retained message behavior.
For the shadow update payload, use this structure:
{
"state": {
"reported": {
"firmware_version": "2.1.3",
"config": {
"sampling_rate": 1000,
"reporting_interval": 60
},
"connectivity": {
"signal_strength": -67,
"network_type": "LTE"
},
"timestamp": 1717502834
}
}
}
Include all configuration parameters that define device state, not just changed values. This ensures the shadow fully represents the device’s current configuration.
Additional Best Practices:
-
Delta Handling: Subscribe to shadow/delta topic to receive desired state changes from the platform. When delta arrives, apply the configuration change and publish updated reported state.
-
Error Handling: If shadow/get request times out, retry with exponential backoff (1s, 2s, 4s). After 3 failures, publish current state anyway to ensure platform has recent data.
-
Timestamp Validation: Include timestamps in reported state. The Device Shadow service uses timestamps to resolve conflicts when multiple updates occur.
-
Monitoring: Log shadow synchronization events (get requests, update publications, delta receipts) for troubleshooting. Track metrics like time-to-sync after reboot and shadow update failures.
-
Testing: Implement a shadow sync test that reboots a device, waits 30 seconds, and verifies the platform shadow matches device state. Run this test regularly in your staging environment.
After implementing these changes, your devices should achieve shadow synchronization within 5-10 seconds of reboot, eliminating configuration drift issues. The periodic reconciliation will catch any edge cases where initial synchronization fails.
Also verify your MQTT client is configured with clean session set to false. If clean session is true, the device loses all its MQTT subscriptions on disconnect, including the shadow delta subscription. This could cause the device to miss shadow update requests from the platform while it’s offline.