We had the exact same issue during our v3.4 to v3.6 firmware migration. Here’s what worked for us:
- Create a backward-compatible monitoring rule that handles both formats
- Use Watson IoT Platform’s rule versioning to maintain historical data
- Implement proper alert timeout adjustments based on new heartbeat intervals
Here’s the complete solution:
Step 1: Capture Both Firmware Heartbeat Formats
Legacy v3.2 format:
{"heartbeat": {"status": "alive", "timestamp": 1234567890}}
New v3.5 format:
{"device": {"health": {"state": "online", "last_seen": 1234567890}}}
Step 2: Update Monitoring Rule Configuration
Navigate to Watson IoT Platform Dashboard:
- Rules Engine > Device Heartbeat Monitor
- Click “Create New Version” (preserves historical alerts)
- Update JSONPath expressions to support both formats:
Rule Name: `Device_Heartbeat_v2_Compatibility
Condition Logic:
($.heartbeat.status == "alive" OR $.device.health.state == "online")
AND
($.heartbeat.timestamp > (current_time - 300) OR $.device.health.last_seen > (current_time - 300))
Step 3: Alert Logic Update
The critical change is implementing time-based alerting regardless of status field:
Alert Trigger Conditions:
- If no heartbeat message received for 5 minutes (300 seconds), trigger alert
- If
last_seen or timestamp field is stale (>5 minutes), trigger alert even if status shows “alive” or “online”
- Separate alert severity levels:
- Warning: No heartbeat for 5-10 minutes
- Critical: No heartbeat for >10 minutes
Alert Configuration:
alert.trigger.timeout=300
alert.trigger.field.path=["$.heartbeat.timestamp", "$.device.health.last_seen"]
alert.trigger.condition=max_age_exceeded
alert.severity.warning=300
alert.severity.critical=600
Step 4: Firmware Heartbeat Interval Adjustment
Check your firmware release notes - v3.5 changed the heartbeat interval:
- v3.2: 60-second intervals
- v3.5: 120-second intervals (to reduce battery consumption)
Update your monitoring thresholds:
- Alert timeout: 300 seconds (accommodates 2 missed heartbeats + network latency)
- Dashboard “last seen” threshold: 180 seconds (1.5x heartbeat interval)
Step 5: Testing and Validation
- Deploy the new rule version to a test device group
- Manually disconnect devices and verify alerts trigger within 5 minutes
- Monitor for false positives over 48 hours
- Gradually roll out to production device groups
Additional Recommendations:
- Enable rule audit logging to track when devices switch between firmware versions
- Set up dashboard widgets to show firmware version distribution across your fleet
- Create separate monitoring rules for critical vs. non-critical device groups
- Implement graduated alert escalation (email at 5 min, SMS at 10 min, PagerDuty at 15 min)
This approach eliminated our missed alerts and reduced false positives by 85%. The backward-compatible rule handles the transition period gracefully while you complete the firmware rollout.