Optimizing device data processing in SAP IoT sapiot-23

alex_builder · April 16, 2025, 1:57pm

Looking for insights on optimizing device data processing performance in sapiot-23. We’re handling data from 2,800 devices sending telemetry every 15 seconds, which creates significant processing load. Interested in hearing about approaches for data streaming vs. batch processing, effective data filtering strategies, and any performance tuning you’ve done to handle high-volume device data efficiently.

singharch · May 18, 2025, 7:12am

Don’t overlook the value of batch processing for non-time-critical workloads. We process 80% of our analytics workloads in batch mode during off-peak hours, reserving streaming for the 20% that requires real-time response. Batch processing is significantly more efficient for aggregations, joins with reference data, and complex analytics. Our approach: stream critical data for immediate action, buffer everything else for batch processing every 15 minutes. This hybrid approach reduced our processing costs by 55% compared to pure streaming while maintaining acceptable latency for most use cases.

solver_dev · May 5, 2025, 9:04pm

For steady-state visibility, we implemented a heartbeat mechanism - devices send full telemetry every 5 minutes regardless of change thresholds, plus change-triggered messages in between. This ensures we never lose track of device state even during long stable periods. The filtering logic is configured in the gateway, not devices, making it easier to adjust thresholds without firmware updates. We also maintain a local buffer at the gateway that can be queried for raw data on-demand, giving us the best of both worlds - reduced data transmission but raw data available when needed for troubleshooting or detailed analysis.

james_thinker · May 23, 2025, 12:29am

Having optimized numerous high-volume IoT deployments, here’s a comprehensive performance optimization framework:

Data Streaming vs. Batch Processing:

The optimal approach is hybrid, not either/or. Categorize your data processing needs:

Real-Time Streaming (< 1 second latency):

Safety-critical alerts (equipment malfunction, threshold violations)
Real-time dashboards and operator displays
Immediate control actions (actuator commands based on sensor readings)
Volume: Typically 5-10% of total data

Near-Real-Time Streaming (1-30 second latency):

Operational monitoring and KPIs
Trend detection and anomaly identification
Cross-device correlation analysis
Volume: Typically 15-20% of total data

Micro-Batch Processing (1-15 minute intervals):

Statistical aggregations (min/max/avg over time windows)
Data enrichment with reference data
Multi-device analytics and comparisons
Volume: Typically 60-70% of total data

Batch Processing (hourly/daily):

Historical reporting and compliance reports
Machine learning model training
Long-term trend analysis and forecasting
Data archival and compression
Volume: All historical data

Data Filtering Strategies:

Implement multi-tier filtering to reduce processing load:

Tier 1 - Device/Edge Filtering:

Change-Based Transmission: Only send when value changes exceed threshold
- Example: Temperature sensor sends only when change > 0.5°C
- Reduces data volume by 60-80% for slowly changing values
- Configure per sensor type based on expected variability
Deadband Filtering: Suppress minor fluctuations around a setpoint
- Example: Pressure sensor with ±2% deadband around target
- Eliminates noise from stable process conditions
- Typical volume reduction: 40-60%
Heartbeat Mechanism: Periodic full telemetry regardless of changes
- Ensures device connectivity and health visibility
- Recommended interval: 5-10 minutes
- Prevents “silent failure” scenarios

Tier 2 - Gateway Filtering:

Data Quality Filtering: Remove invalid/out-of-range values
Duplicate Detection: Suppress repeated identical messages
Rate Limiting: Prevent device malfunctions from overwhelming system
Aggregation: Pre-compute statistics before transmission to cloud

Tier 3 - Platform Filtering:

Subscription-Based Routing: Only process data for active subscribers
Priority-Based Processing: Critical data processed before non-critical
Sampling: Reduce granularity for non-critical analytics (keep every Nth message)

Performance Optimization Techniques:

1. Processing Architecture:


Device Data → Gateway → Message Broker (MQTT) → Stream Processor → Time-Series DB
                                              ↓
                                         Batch Processor → Data Lake

2. Stream Processing Optimization:

Use windowing for time-based aggregations (tumbling windows for non-overlapping aggregates)
Implement stateful processing only where necessary (stateless is 10x faster)
Partition streams by device ID for parallel processing
Use async I/O for database writes (batch commits every 1000 messages or 5 seconds)

3. Batch Processing Optimization:

Process data in time-partitioned chunks (hourly or daily partitions)
Use columnar storage format (Parquet) for analytics workloads (5-10x faster queries)
Implement incremental processing (only process new data, not full dataset)
Schedule heavy batch jobs during off-peak hours

4. Database Optimization:

Use time-series optimized storage (SAP HANA time-series tables)
Implement data partitioning by time (daily or weekly partitions)
Create appropriate indexes (device_id + timestamp compound index)
Use compression for historical data (60-80% space savings)

Practical Configuration for 2,800 Devices @ 15-second intervals:

Baseline Load:

Messages/second: 2,800 ÷ 15 = ~187 msg/sec
Daily messages: 187 × 86,400 = 16.2 million msg/day
Monthly data: ~486 million messages

After Optimization:

Edge filtering (70% reduction): 56 msg/sec, 4.8M msg/day
Gateway aggregation (5-min windows): 9.3 msg/sec, 800K msg/day
Result: 95% reduction in platform processing load

Implementation Recommendations:

Start with edge filtering - highest impact, lowest infrastructure cost
Implement gateway aggregation for devices in the same physical location
Separate hot/cold paths in platform processing
Monitor processing latency per path and optimize bottlenecks
Use adaptive thresholds that adjust based on recent data patterns

Performance Metrics to Monitor:

End-to-end latency (device → dashboard)
Processing throughput (messages/second)
Queue depth (backlog in message broker)
Database write latency
CPU/memory utilization of processing nodes

Cost-Performance Trade-offs:

Higher filtering = Lower costs but potential data loss

Lower filtering = Higher costs but complete data fidelity

Recommended approach: Aggressive filtering for non-critical data, minimal filtering for critical safety/compliance data. Store filtered-out data in low-cost archive for potential future analysis.

The key insight: Most IoT data is redundant or low-value. Intelligent filtering and tiered processing ensures you pay for high-performance processing only for data that truly needs it, while still capturing everything for compliance and future analysis.

carlosops · April 18, 2025, 2:10pm

We optimized our processing pipeline by separating hot path (real-time) and cold path (batch) data flows. Hot path: critical alerts and real-time dashboard data, processed immediately with sub-second latency. Cold path: historical analytics and reporting data, processed in 5-minute micro-batches. This separation allowed us to tune each path independently. The hot path uses in-memory processing with minimal transformations, while the cold path applies complex aggregations and enrichment. Processing efficiency improved by 40% and we reduced infrastructure costs significantly by not over-provisioning for peak real-time loads.

james_thinker · April 16, 2025, 2:21pm

For high-volume device data, we switched from batch processing to stream processing using SAP IoT’s streaming analytics engine. The key was implementing intelligent filtering at the edge - devices only send data when values change beyond a threshold (e.g., temperature change > 0.5°C) rather than on a fixed schedule. This reduced our data volume by 70% while maintaining data quality for analytics. We also implemented data aggregation at the gateway level, sending pre-aggregated metrics every minute instead of raw readings every 15 seconds. The streaming engine handles real-time alerting while batch jobs process historical aggregates for reporting.

Topic		Replies	Views
Data storage bottleneck: High ingest latency in SAP IoT with HANA backend during peak sensor data loads SAP IoT question , performance-opt , rest-api , hana , real-time-analytics , partitioning , data-storage , sapiot-25 , ingest-latency	6	0	December 29, 2024
Real-time data visualization vs batch processing: What are the trade-offs for IoT dashboards? Oracle IoT Cloud discussion , reporting-analytics , performance , batch-processing , real-time-streaming , architecture-design , data-ingestion , viz-dashboar , oiot-pm	3	0	April 2, 2025
Managing Sensor Streams and Real-Time Telemetry in IoT Platforms Generic IoT Topics discussion , data-integration , edge-processing , sensor-streams , real-time-telemetry , telemetry-api , sensor-streams-real	6	0	June 2, 2025
Cost optimization strategies for device management workloads on Watson IoT Platform IBM Watson IoT discussion , cost-mgmt , capacity-plan , data-retention , billing-engi , device-mgmt , wiot-24 , telemetry-optimization	5	0	June 19, 2025
Data stream batch uploads lag with high-throughput edge devices causing delayed analytics Cisco IoT Cloud Connect question , performance-opt , batch-processing , analytics-delay , mqtt , data-stream , cciot-25 , iot-operations , edge-devices	6	1	December 18, 2024
Handling high-frequency data streams in event-processing pipeline without data loss SAP IoT discussion , async-processing , event-processing , data-stream , high-frequency , sapiot-23 , pipeline-scaling , batching-strategy , dead-letter-queue	5	0	June 18, 2025
Data stream lag observed when processing high-frequency sensor data in analytics pipeline IBM Watson IoT question , stream-analytics , performance-opt , real-time-processing , kafka , data-stream , stream-lag , wiot-24 , dashboard-delay	5	0	July 4, 2025
Batch vs streaming ingestion for industrial sensor data: performance and cost trade-offs Cisco IoT Cloud Connect discussion , performance-opt , architecture , cost-optimization , analytics-report , data-ingestion , data-stream , cciot-24 , batch-vs-stream	5	0	August 4, 2025
Best practices for integrating IoT telemetry with cloud ERP systems via Dataflow Google Cloud IoT discussion , integration , dataflow , pubsub , error-handling , integration-reliability , schema-mapping , gcpiot-25 , sys-integration	7	0	April 1, 2025

Optimizing device data processing in SAP IoT sapiot-23

Related topics