Comprehensive solution addressing all three focus areas:
API Rate Limits:
Cumulocity enforces tenant-level rate limits:
- Standard tier: ~100 requests/second
- Enterprise tier: ~200 requests/second (configurable)
- Limits are cumulative across ALL API operations (REST, MQTT processing, etc.)
- 429 responses include Retry-After header indicating when to retry
Your current load:
- 450 events / 5 seconds = 90 requests/second
- Near threshold with zero margin for other operations
- Bursts or concurrent operations easily exceed limits
Rate limits apply to:
- REST API calls (POST, GET, PUT, DELETE)
- MQTT messages processed by platform (each publish = 1 API operation)
- Bulk operations (count as 1 request regardless of batch size)
Event Batching Strategy:
Implement batching at multiple levels:
1. Device-Level Batching (MQTT SmartREST):
// SmartREST template for multi-measurement
211,temperature,25.5,pressure,101.3,vibration,0.05
Single MQTT publish sends 3 measurements = 1 API operation (vs 3 separate operations)
2. Gateway-Level Batching:
If using gateway architecture:
- Gateway collects events from multiple devices
- Batches into groups of 100-500 events
- Posts via bulk events API every 5-10 seconds
POST /event/events/bulk
[
{"type": "c8y_Temperature", "source": {"id": "device1"}, "text": "25.5°C"},
{"type": "c8y_Pressure", "source": {"id": "device1"}, "text": "101.3 kPa"},
... (up to 1000 events)
]
3. Measurement API Alternative:
For telemetry, use measurements API instead of events:
POST /measurement/measurements/bulk
[
{"source": {"id": "device1"}, "type": "c8y_MultiSensor",
"c8y_Temperature": {"T": {"value": 25.5, "unit": "C"}},
"c8y_Pressure": {"P": {"value": 101.3, "unit": "kPa"}}}
]
Measurements are optimized for time-series data and have better performance characteristics.
Device Reporting Interval Optimization:
Optimize intervals based on measurement criticality:
Tiered Reporting Strategy:
1. Critical measurements (temperature): 5s interval
2. Important measurements (pressure): 10s interval
3. Monitoring measurements (vibration): 30s interval
4. Status/diagnostic data: 5min interval
This reduces from 450 events/5s to:
- Temperature: 150 events/5s (30 req/sec)
- Pressure: 150 events/10s (15 req/sec)
- Vibration: 150 events/30s (5 req/sec)
- Total: 50 req/sec (44% reduction)
Dynamic Interval Adjustment:
# Pseudocode for adaptive reporting
if measurement_in_normal_range:
interval = 30s # Slow reporting
elif measurement_near_threshold:
interval = 10s # Increased monitoring
elif measurement_in_alert:
interval = 5s # Real-time reporting
Edge Aggregation Pattern:
Implement edge aggregator for fleet management:
1. Devices → Edge Gateway (local network, high frequency)
2. Edge Gateway batches events (every 5-10s)
3. Gateway → Cloud (bulk API, low request count)
4. Cloud processes batch as single transaction
Benefits:
- Reduces cloud API calls by 100-500x
- Maintains local real-time monitoring
- Resilient to network interruptions (local buffering)
- Stays well under rate limits
Implementation Recommendations:
Immediate Fix (reduce by 90%):
- Switch from individual POST to bulk events API
- Batch 450 events into single POST every 5s
- Request rate: 90 req/sec → 0.2 req/sec
Short-term Optimization (reduce by 95%):
- Implement SmartREST templates for MQTT devices
- Combine 3 measurements per device into 1 message
- Request rate: 90 req/sec → 30 req/sec
- Then apply bulk API: 30 req/sec → 0.1 req/sec
Long-term Architecture:
- Deploy edge gateway/aggregator
- Implement tiered reporting intervals
- Use measurements API for telemetry (not events API)
- Reserve events API for state changes and alerts
- Monitor rate limit headers in responses
Rate Limit Handling Code:
# Pseudocode for handling 429 responses
def send_events_with_retry(events_batch):
max_retries = 3
for attempt in range(max_retries):
response = post('/event/events/bulk', events_batch)
if response.status == 429:
retry_after = response.headers.get('Retry-After', 60)
sleep(retry_after)
continue
elif response.status == 200:
return success
else:
log_error(response)
break
return failure
Your 429 errors are caused by excessive individual API calls (90 req/sec) approaching tenant rate limits. Implement event batching using bulk APIs to reduce request rate by 99% while maintaining data throughput. Combine with optimized device reporting intervals for maximum efficiency.