Having designed multiple monitoring systems with both approaches, here’s a comprehensive analysis:
Event Subscription Architecture:
Advantages:
- Sub-second alert delivery (typically 100-300ms from event to alert)
- Efficient resource usage (no unnecessary polling traffic)
- Scalable to thousands of monitored entities
- True real-time responsiveness
- Lower network and server load
Disadvantages:
- Complex setup requiring message broker infrastructure
- Connection management overhead (reconnection logic, health checks)
- Potential event loss during connection outages (without proper queuing)
- Subscription lifecycle management (expiration, renewal)
- Debugging is more difficult (events are ephemeral)
Missed Event Risks:
- WebSocket connection drops (network issues, server restarts)
- Subscription expiration if not renewed properly
- Event queue overflow during high-volume bursts
- Message broker failures (if not highly available)
- Client processing delays causing backpressure
Polling API Architecture:
Advantages:
- Simple implementation (just periodic API calls)
- Reliable data retrieval (eventual consistency guaranteed)
- Easy debugging (query history available)
- Predictable load patterns
- No connection management complexity
Disadvantages:
- Inherent latency (minimum = polling interval / 2)
- Cannot achieve sub-second alert delivery
- Higher network and server load (continuous polling)
- Misses transient events between poll intervals
- Inefficient for large-scale monitoring (N devices × polling frequency)
Polling Interval Tradeoffs:
- 1-second polling: ~500ms average latency, high server load
- 5-second polling: ~2.5s average latency, moderate load, acceptable for most monitoring
- 30-second polling: ~15s average latency, low load, suitable for non-critical status checks
Recommended Architecture for Compliance:
Implement a dual-path monitoring system:
-
Primary Path: Event Subscriptions
- Use MQTT with QoS 2 (exactly once delivery)
- Configure persistent sessions with message retention
- Implement automatic reconnection with exponential backoff
- Monitor subscription health continuously
- Buffer events during connection outages
- Delivers real-time alerts with sub-second latency
-
Secondary Path: Polling Verification
- Poll every 30-60 seconds for status verification
- Compare polled data timestamps with event timestamps
- Detect any gaps indicating missed events
- Trigger alerts if discrepancies found
- Provides compliance audit trail
-
Event Completeness Verification
- Every event includes sequence number
- Monitoring system tracks sequence and detects gaps
- Poll API to retrieve missed events by sequence range
- Ensures zero event loss for compliance
Implementation Guidelines:
Event Subscription Setup:
// Pseudocode for robust subscription:
1. Establish MQTT connection with persistent session
2. Subscribe to event topics with QoS 2
3. Implement message handler with sequence tracking
4. On connection loss: buffer outgoing alerts locally
5. On reconnection: request missed events by sequence number
6. Monitor subscription health, alert on failures
Polling Verification:
// Pseudocode for polling verification:
1. Every 60 seconds: Poll device status API
2. Compare last event timestamp with polled timestamp
3. If gap > 5 seconds: Query event history for missing events
4. Process any missed events and trigger alerts
5. Log verification results for compliance audit
Alert Delivery Reliability:
To ensure zero missed alerts:
- Event subscriptions provide primary real-time delivery
- Persistent message queuing prevents loss during outages
- Polling verification catches any subscription failures
- Sequence number tracking detects event gaps
- Audit logging proves compliance with alert delivery SLA
Missed Event Risk Mitigation:
- Use persistent MQTT sessions (not WebSocket)
- Configure message retention on broker (24 hours minimum)
- Implement client-side event buffering
- Monitor subscription health with heartbeat events
- Automatic fallback to polling if subscription fails
- Periodic gap detection via polling verification
Performance Comparison:
For 1000 monitored devices:
Event Subscriptions:
- Alert latency: 100-300ms
- Server load: Low (event-driven)
- Network traffic: Minimal (events only)
- Missed event risk: <0.1% with proper infrastructure
Polling (5-second interval):
- Alert latency: 2.5s average
- Server load: High (200 requests/second)
- Network traffic: Continuous (even when no events)
- Missed event risk: Transient events between polls
Conclusion:
For your sub-second alert requirement with zero missed events:
- Primary: Event subscriptions with MQTT QoS 2
- Backup: Polling verification every 60 seconds
- Monitoring: Subscription health and sequence gap detection
- Compliance: Audit logging of all events and alerts
This architecture delivers real-time alerts while ensuring compliance through redundant verification. Pure polling cannot meet sub-second requirements, and pure subscriptions without verification pose compliance risks.