Integrating Greengrass v2 edge ML with ERP for predictive maintenance automation

We’ve successfully automated our predictive maintenance workflow by integrating Greengrass v2 edge ML inference with our ERP system. This has reduced equipment downtime by 40% and eliminated manual work order creation.

Our manufacturing equipment has sensors running ML models locally on Greengrass cores for real-time vibration analysis and failure prediction. When the model detects a potential failure, we automatically create maintenance work orders in our ERP system through API integration.

The architecture uses Greengrass Stream Manager to buffer predictions locally, Lambda functions to process and enrich the data with equipment metadata from our asset registry, and direct API calls to our ERP’s maintenance module. The system handles network interruptions gracefully - predictions are queued locally and sync when connectivity returns.

Key outcomes: maintenance work orders created within minutes of prediction (vs 2-3 days manual process), 95% prediction accuracy for bearing failures, maintenance team can schedule work proactively instead of responding to breakdowns. The edge ML approach means predictions happen even during network outages, which is critical for our remote facilities.

From an operational perspective, how did your maintenance team adapt to this automated workflow? Did you face any resistance to having ML models create work orders directly? We’re concerned our team might not trust automated predictions initially.

This is exactly what we’re trying to implement. How do you handle authentication for the ERP API calls? Are you using service accounts, or do you have a dedicated integration user? Also, what happens if the ERP system is down when a prediction comes in?

Let me detail the complete implementation across the three focus areas:

Greengrass Stream Manager: This is the foundation of our resilient architecture. Stream Manager runs as a system component on each Greengrass core and provides persistent message queuing with automatic retry. We configured streams with a 7-day retention policy and 10MB size limit per stream. Each manufacturing line has a dedicated stream named predictions-line-{lineId}. The ML inference component publishes predictions to these streams using the Stream Manager SDK. Configuration includes automatic export to IoT Core when connectivity is available, but the critical feature is local persistence - if cloud connectivity drops, predictions accumulate locally and sync when the network returns. This ensures zero data loss during network outages, which are common in our factory environments.

Lambda Integration: We use two Lambda functions in the workflow. The first Lambda subscribes to IoT Core topics where Stream Manager exports predictions. It enriches the prediction data by querying DynamoDB for equipment metadata (asset ID, location, maintenance history, criticality level). The enrichment adds context that the ERP system needs to properly route and prioritize work orders. The second Lambda handles the actual ERP integration. It applies business rules to filter predictions (severity threshold, deduplication, persistence check), constructs the work order payload according to our ERP’s API schema, and makes the REST API call with OAuth authentication. Both Lambdas include comprehensive error handling with CloudWatch alarms for failed invocations. The Lambda functions are idempotent - they check if a work order already exists for the equipment before creating a new one.

ERP API Automation: Our ERP exposes a REST API for work order creation. The integration Lambda constructs a JSON payload that includes equipment ID, predicted failure type, severity score, recommended maintenance action, and estimated time to failure. The API call uses OAuth 2.0 client credentials flow - the Lambda retrieves credentials from Secrets Manager and caches tokens for the 1-hour validity period. For resilience, we implemented exponential backoff retry with jitter (3 attempts over 15 minutes). If all retries fail, the message goes to a DLQ for manual review. The ERP API returns a work order ID which we store in DynamoDB to prevent duplicate creation. We also update the device shadow with the work order ID, allowing technicians to see which work orders are associated with each piece of equipment.

For operational adoption, we ran a 30-day pilot where automated work orders were created but flagged as ‘AI-Generated’ requiring supervisor approval. This built trust as the team saw the prediction accuracy. After the pilot, we moved to automatic creation for high-confidence predictions (score > 0.8) while medium-confidence predictions (0.5-0.8) still require approval. The maintenance team now reviews a dashboard showing prediction accuracy trends, which has improved their confidence in the system.

Performance metrics: average end-to-end latency from prediction to work order creation is 45 seconds (edge inference 5s, Stream Manager export 10s, Lambda processing 15s, ERP API call 15s). During a recent 8-hour network outage, the system queued 127 predictions locally and successfully synced them all when connectivity returned, with no manual intervention required.

We use OAuth 2.0 with a dedicated service account that has permissions only for the maintenance module. The Lambda function refreshes tokens automatically. For ERP downtime, Stream Manager is key - it queues messages locally on the Greengrass core with configurable retention (we use 7 days). When the ERP comes back online, the Lambda processes the backlog in order. We’ve also implemented exponential backoff retry logic to avoid overwhelming the ERP during recovery.

How granular are your predictions? Are you creating a work order for every single prediction, or do you aggregate multiple predictions before triggering ERP integration? I’m concerned about generating too many work orders for minor issues.