I want to add the infrastructure perspective since network reliability was mentioned. We’ve found that hybrid integration strategies work best when you have clear event classification. Here’s what we’ve learned across multiple implementations:
Network Reliability Considerations:
Real-time integration requires consistent network availability - aim for 99.5% uptime minimum. If your plant networks can’t meet this, real-time will cause more problems than it solves. Key factors:
• Wireless networks are problematic for real-time - latency spikes during interference
• Wired networks are more predictable but still need quality of service (QoS) configuration
• Segment your network - don’t mix machine data traffic with office traffic
• Monitor packet loss - anything above 1% will cause issues with real-time protocols
Latency Requirements by Use Case:
Different scenarios have different tolerance:
• Machine downtime alerts: <5 seconds acceptable, real-time justified
• Quality failures: <30 seconds acceptable, near-real-time sufficient
• Production counts: <2 minutes acceptable, batch processing fine
• Temperature/pressure readings: <5 minutes acceptable, batch processing preferred (reduces noise)
• Tool wear indicators: <10 minutes acceptable, batch definitely sufficient
Hybrid Integration Strategy Framework:
The approach Susan and you mentioned works well. Here’s how to structure it:
-
Event Classification: Define three tiers - Critical (real-time), Important (near-real-time), Routine (batch)
-
Transport Layer: Use message queuing for real-time events (ensures delivery even if MES is briefly unavailable), REST APIs for batch uploads
-
Database Impact: Real-time events write to a hot table with short retention (24 hours), batch processes write to main tables. Reduces database contention.
-
Failover Logic: When real-time connection fails, queue events locally and switch to batch mode automatically. Resume real-time when connection restores.
Performance Optimization:
If you implement hybrid integration:
• Batch your routine updates but vary the batch timing across machines (don’t have 200 machines all sending updates at :00, :05, :10)
• Use delta updates for batch processing - only send values that changed
• Compress batch payloads if sending large datasets
• Implement backpressure handling - if MES is slow to respond, increase batch intervals temporarily
The sweet spot we’ve found is real-time for about 10-15% of events (the critical ones), near-real-time (30-60 second batches) for another 20%, and regular batching (2-5 minutes) for the remaining 65-70%. This balances visibility with system load effectively.