Let me provide a comprehensive solution addressing all three critical areas:
Heartbeat Interval Optimization:
Your 60-second heartbeat is appropriate for connection health monitoring, but it’s decoupled from location update processing. Keep heartbeat at 60 seconds for TCP connection management, but implement a separate acknowledgment mechanism for location updates:
heartbeat.interval=60000
location.ack.timeout=5000
connection.retry.backoff=2000
This ensures location updates are acknowledged independently within 5 seconds, while heartbeats maintain connection health.
Location Update Frequency Alignment:
Your device-side 15-second update frequency is fine, but server-side processing must match or exceed this rate. Reduce polling interval to 10 seconds maximum, but more importantly, implement these changes:
server.polling.interval=10000
location.update.priority=high
update.processing.threads=8
queue.processing.mode=parallel
The parallel processing mode with 8 threads allows simultaneous processing of queued updates rather than sequential batch processing. This is crucial for your 166 updates/sec throughput.
Polling Interval and Batch Processing:
Your current configuration creates a processing bottleneck. With 30-second polling and 100-item batches, you’re processing only 200 updates per minute while receiving 10,000. Implement these critical changes:
server.polling.interval=10000
max.batch.updates=500
batch.processing.timeout=8000
queue.max.size=15000
queue.overflow.strategy=priority
The increased batch size (500) combined with 10-second polling gives you capacity for 3,000 updates per minute - sufficient headroom for your current load. The priority overflow strategy ensures recent location updates take precedence if queue limits are reached.
Immediate Action Plan:
- Reduce polling interval from 30s to 10s
- Increase batch size from 100 to 500
- Enable parallel processing with 8 threads
- Monitor queue depth - should stay below 2,000 items
Long-term Architecture:
As others suggested, migrate to event-driven push architecture using MQTT or WebSocket. This eliminates polling latency entirely and reduces server load by 60-70%. For your 2,500 devices, MQTT would provide sub-second location updates with minimal infrastructure overhead.
With these polling optimizations, expect sync delays to drop from 3-5 minutes to under 30 seconds. Full push architecture would achieve sub-10 second updates consistently.