Complete solution for handling REST API rate limiting at scale:
Exponential Backoff Strategy:
Implement proper retry logic with exponential backoff and jitter:
import random
import time
def publish_with_backoff(device_id, telemetry_data):
max_retries = 5
base_delay = 2
for attempt in range(max_retries):
response = publish_telemetry(device_id, telemetry_data)
if response.status_code == 200:
return True
elif response.status_code == 429:
if attempt == max_retries - 1:
return False
delay = min(base_delay * (2 ** attempt), 60)
jitter = random.uniform(0, delay * 0.3)
time.sleep(delay + jitter)
else:
return False
API Quota Management:
Implement client-side rate limiting using token bucket algorithm:
# Pseudocode - Token bucket rate limiter:
1. Initialize bucket with max_tokens (e.g., 100 requests)
2. Refill bucket at rate of tokens_per_second (e.g., 10/sec)
3. Before each API call, check if bucket has tokens
4. If tokens available, consume one and make request
5. If bucket empty, wait until next refill cycle
6. Monitor X-RateLimit-Remaining header and adjust refill rate dynamically
Request Batching:
Batch telemetry messages to reduce API call frequency:
from google.cloud import iot_v1
def batch_publish_telemetry(device_id, telemetry_list):
client = iot_v1.DeviceManagerClient()
device_path = client.device_path(project, location, registry, device_id)
# Batch up to 50 messages per request
batch_size = 50
for i in range(0, len(telemetry_list), batch_size):
batch = telemetry_list[i:i+batch_size]
payload = json.dumps({'messages': batch}).encode('utf-8')
client.modify_cloud_to_device_config(
request={'name': device_path, 'binary_data': payload}
)
Rate Limit Headers:
Proactively monitor and respect rate limit headers:
- Parse X-RateLimit-Remaining from every response
- Calculate current consumption rate: requests_made / time_elapsed
- If remaining < 10% of limit, reduce request rate by 50%
- Use X-RateLimit-Reset to schedule request resumption
- Implement circuit breaker: if 3 consecutive 429s, pause for reset period
Device-Side Optimizations:
- Stagger device sync times: Add random offset (0-60 seconds) to each device’s sync schedule based on deviceId hash
- Local buffering: Queue up to 100 telemetry points locally, publish in batches
- Adaptive sampling: Reduce telemetry frequency during rate limit events
- Priority queuing: Mark critical telemetry (alarms, alerts) for immediate delivery
Infrastructure Configuration:
- Request API quota increase from Google Cloud support for production workloads
- Use separate device registries for different device classes to isolate rate limits
- Implement regional failover: if one region hits limits, route to alternate region
- Monitor quota utilization in Cloud Monitoring, alert at 70% threshold
Advanced Patterns:
- Use Cloud Pub/Sub as an intermediate buffer: devices publish to Pub/Sub, backend consumes at controlled rate
- Implement priority lanes: separate API clients for high-priority vs normal telemetry
- Deploy API gateway with rate limiting in front of IoT Core for finer control
Monitoring & Alerts:
Metrics to track:
- API request rate (requests/second)
- 429 error rate
- Retry success rate
- Average backoff delay
- Data loss percentage
- Rate limit headroom (remaining/total)
After implementing exponential backoff with jitter, request batching, and proactive rate limit monitoring, our data loss dropped from 25% to under 1%, and we successfully scaled to 1,200+ devices without hitting rate limits.