Based on implementing billing ingestion for multiple aziot-25 deployments, here’s a comprehensive approach addressing batch processing, parallelization, and data validation:
Batch Processing Strategy:
Implement tiered batching based on use case. For billing calculations, use 5-15 minute batches to reduce storage operations while maintaining reasonable freshness. Configure Stream Analytics with tumbling windows:
- Real-time dashboard: 1-minute windows for immediate visibility
- Billing aggregation: 15-minute windows for cost efficiency
- Reconciliation: Hourly windows for audit compliance
Use Event Hub capture feature to automatically archive raw events to Blob Storage every 5 minutes. This provides the audit trail without custom dual-write logic. Configure capture with Avro format for efficient storage and downstream processing.
Parallelization Architecture:
Scale Event Hubs to 32 partitions for 1000+ devices. Partition by customer/tenant ID rather than device ID to ensure per-customer ordering. Configure consumer groups:
- Real-time processing: High priority, 32 consumer instances
- Billing aggregation: Medium priority, 16 consumer instances
- Audit/archive: Low priority, 8 consumer instances
Implement auto-scaling for consumer instances based on Event Hub lag metrics. Scale up when lag exceeds 1 million events or 5 minutes of data. Use Azure Monitor metrics to trigger scaling actions.
Data Validation Framework:
Implement three-tier validation:
- Device validation: Basic schema and range checks before sending
- Ingestion validation: Schema enforcement and duplicate detection at Event Hub
- Business validation: Usage pattern analysis and anomaly detection post-aggregation
Use Azure Functions with Durable Entities to maintain per-device metering state. Track expected usage patterns and flag anomalies for manual review before billing. Implement circuit breakers that pause billing for devices showing suspicious patterns until validated.
Billing Accuracy Safeguards:
Implement reconciliation jobs that compare three data sources:
- Real-time aggregates (Stream Analytics output)
- Archived raw events (Event Hub capture)
- Device-reported totals (periodic checksum messages)
Run reconciliation hourly for critical customers, daily for standard customers. Auto-correct discrepancies under 1%, flag larger differences for investigation. Maintain immutable audit log of all billing adjustments.
Cost Optimization:
Batch processing reduces database writes by 90-95%. With 1000 devices sending events every 60 seconds, you go from 1.44M writes/day to ~144K aggregated writes. This translates to significant Cosmos DB RU savings. Use Event Hub standard tier with 32 partitions and 4 throughput units for baseline, enable auto-inflate to 8 TUs for peak handling.
Implementation Recommendations:
Start with 16 Event Hub partitions and 2 throughput units, monitor for 2 weeks, then scale based on actual patterns. Implement the real-time dashboard path first using simple aggregation, then add sophisticated validation and reconciliation. This allows customers to see usage immediately while you build billing accuracy safeguards. Deploy billing calculation as a separate pipeline from real-time display - never let dashboard performance issues impact billing accuracy.
The key insight from production deployments is that billing and real-time display have fundamentally different requirements. Optimize each path independently rather than trying to use a single pipeline for both purposes.