Greengrass firmware updates: comparing OTA vs local push for edge reliability

isabella344 · January 6, 2025, 1:32pm

We’re designing our firmware update strategy for 10,000 Greengrass edge devices deployed across remote locations with varying network reliability. Debating between OTA updates via AWS IoT Jobs versus local push updates using physical access.

OTA updates are attractive for automation but concerned about network reliability in remote sites. Some locations have intermittent connectivity that could interrupt large firmware downloads. Local push updates guarantee completion but don’t scale well and require field technician visits.

What experiences have others had with OTA vs local push methods? How do you handle update automation when network reliability varies significantly across deployment sites? Looking for real-world insights on balancing automation benefits against reliability requirements.

gupta_pro · February 8, 2025, 2:51pm

From a field operations perspective, local push updates are extremely expensive at scale. Each site visit costs $500-1000 in technician time and travel. For 10k devices, that’s $5-10M for a single update cycle. OTA updates have infrastructure costs but the per-device cost is essentially zero after initial setup. Even with 5% failure rate requiring site visits, OTA is vastly more cost-effective.

amanda_pro · February 22, 2025, 4:06am

Let me provide a comprehensive comparison based on our multi-year experience with both approaches:

OTA vs Local Push Analysis:

OTA Updates (AWS IoT Jobs + Greengrass):

Advantages:

Full automation capability - zero manual intervention for successful updates
Progressive rollout support - staged deployment with automatic pause on failures
Built-in resume/retry - handles intermittent connectivity gracefully
A/B partition support - automatic rollback on health check failures
Real-time monitoring - CloudWatch metrics for update progress and success rates
Centralized control - manage 10k devices from single console
Cost-effective at scale - near-zero marginal cost per device

Disadvantages:

Network dependency - requires sufficient bandwidth and connectivity windows
Initial setup complexity - requires proper device partitioning and health checks
Security considerations - need secure firmware distribution and verification
Monitoring overhead - must track update status across large fleet

Local Push Updates:

Advantages:

Network independent - works regardless of connectivity
Guaranteed completion - technician verifies successful update
Physical verification - can inspect device state directly
Immediate rollback - technician can restore from backup on-site
No cloud dependency - works even if AWS connection is down

Disadvantages:

Doesn’t scale - requires site visits for every device
High cost - $500-1000 per site visit × 10k devices = $5-10M per update
Slow deployment - months to complete full fleet update
Human error risk - incorrect firmware versions or procedures
No centralized visibility - hard to track completion status
Update fatigue - frequent updates become operationally infeasible

Network Reliability Considerations:

For sites with varying connectivity, implement tiered strategies:

High Reliability Sites (>95% uptime):
- Full OTA automation
- Aggressive update schedules
- Minimal monitoring required
Medium Reliability Sites (80-95% uptime):
- OTA with extended timeouts
- Longer download windows (24-72 hours)
- Staged rollouts per site
- Automated retry logic
Low Reliability Sites (<80% uptime):
- OTA with pre-staging during connectivity windows
- Update execution during next connection
- Fallback to local push only if OTA fails after 3 attempts

Update Automation Best Practices:

Our production implementation:

Pre-Update Validation:
- Device health check before starting update
- Verify sufficient storage space
- Check battery level (for battery-powered devices)
- Confirm network bandwidth availability

Progressive Rollout Strategy:


Stage 1: Canary group (1% of fleet, 50-100 devices)
- Wait 24 hours, monitor metrics

Stage 2: Early adopters (10% of fleet)
- Wait 48 hours, analyze telemetry

Stage 3: Broad deployment (50% of fleet)
- Wait 24 hours, verify stability

Stage 4: Remaining devices (39% of fleet)
- Complete rollout

Health Check Implementation:
- Post-update device reboot
- Automatic connectivity test
- Application-level health verification
- Automatic rollback if any check fails
- Report status to IoT Jobs
Rollback Automation:
- Maintain previous firmware on separate partition
- Automatic revert on boot failure (3 consecutive failures)
- Manual rollback trigger via IoT Jobs
- Preserve device configuration across rollbacks

Cost-Benefit Analysis:

For 10,000 devices with quarterly updates:

OTA Approach:

Initial setup: $50k (engineering + infrastructure)
Per-update cost: $2k (monitoring + AWS services)
Annual cost: $58k
95% success rate, 5% require site visits = $250k site visits
Total annual cost: $308k

Local Push Approach:

Per-update cost: $5M (10k devices × $500 per visit)
Annual cost: $20M (4 updates/year)
100% success rate (by definition)
Total annual cost: $20M

Savings with OTA: $19.7M annually (98.5% cost reduction)

Recommended Strategy:

Implement OTA as primary method with local push as fallback:

Deploy OTA infrastructure with proper health checks and rollback
Classify sites by network reliability
Use progressive rollouts starting with high-reliability sites
Monitor update success rates per site
Schedule local push only for repeated OTA failures
Continuously improve OTA success rates based on telemetry

This approach gives you 90%+ automation benefits while maintaining reliability guarantees through strategic use of local push for the small percentage of problematic updates.

The data clearly shows OTA updates are superior for fleet management at scale, with network reliability concerns being addressable through proper implementation of timeouts, retries, and progressive rollouts.

isabella344 · January 27, 2025, 10:10pm

For update automation, we built a progressive rollout system using IoT Jobs dynamic groups. Start with 1% of devices (canary group), monitor for 24 hours, then 10%, then 50%, then remaining devices. Each stage waits for health metrics before proceeding. This catches firmware issues early while still being fully automated. We can pause or rollback at any stage. Local push can’t match this level of control and safety.

Topic		Views
Comparing OTA vs manual firmware updates for industrial IoT fleet Google Cloud IoT discussion , security , ota-updates , iot-core , hw-integration , firmware-mgmt , gcpiot-24 , update-methods , device-uptime	6	December 11, 2024
Comparing over-the-air vs USB firmware update methods for field deployment reliability Cisco IoT Cloud Connect discussion , deployment-strategy , firmware-update , update-reliability , gateway-mgmt , cciot-25 , ota-usb-update , ota-vs-usb	6	March 5, 2025
OTA updates via firmware-mgmt SDK vs manual firmware push in aziot-25: reliability and auditability Microsoft Azure IoT discussion , api-development , audit-logging , update-reliability , ota-updates , firmware-mgmt , aziot-25 , firmware-sdk , update-traceability	4	January 17, 2025
Firmware management connectivity: OTA updates vs USB deployment trade-offs SAP IoT discussion , connectivity , deployment-strategy , firmware-mgm , ota-updates , firmware-management , device-mgmt , sapiot-23	4	September 12, 2025
Comparing OTA and local firmware update methods for industrial gateway deployments PTC ThingWorx discussion , audit-compliance , deployment-strategy , firmware-update , gateway-mgmt , ota-updates , twx-97 , rollback-recovery	3	October 10, 2025
Firmware management: OTA updates versus manual patching for industrial IoT devices Microsoft Azure IoT discussion , devops-deploy-auto , compliance-audit , edge-compute , firmware-mgmt , aziotc , azure-device-update , ota-vs-manual	6	February 23, 2025
Over-the-air firmware updates using Pub/Sub for remote pump stations with cellular connectivity constraints Google Cloud IoT use-case , devops-deploy-auto , cloud-storage , pub-sub , mqtt , firmware-mgmt , iiot-support , ota-update , pubsub-23	4	September 27, 2025
OTA firmware update fails for devices marked offline in registry during scheduled rollout Google Cloud IoT question , update-fail , device-registry , firmware-mgmt , ota-update , gcpiot-24 , sys-integration , firmware-compliance , device-status	7	August 11, 2025
Comparing OTA firmware update via SDK versus manual push for device fleet management IBM Watson IoT discussion , automation , rollback , api-sdk , firmware-mgmt , fleet-management , ota-update , wiot-24	4	September 9, 2025

Greengrass firmware updates: comparing OTA vs local push for edge reliability

Related topics