We’re losing WebSocket connections when ingesting telemetry from 300+ sensors sending data every 5 seconds. The WebSocket buffer appears to overflow during peak loads, causing disconnects and data loss. Our reconnect logic is basic - it just tries to reconnect immediately without any backoff, which sometimes makes the problem worse. We haven’t implemented data compression yet.
WebSocket buffer: default (64KB)
Message rate: 60 msg/sec per connection
Disconnect frequency: 15-20 times/hour
Data loss: ~5% of measurements
Is there a recommended buffer size for high-throughput scenarios? How should we handle reconnection to avoid overwhelming the server?
64KB buffer is definitely too small for your volume. We use 512KB buffers for similar loads. Also, you need to implement proper backpressure handling - if the buffer fills up, you should either drop oldest messages or implement local queuing on the client side.
Good point about backoff. What about data compression? Would enabling compression help reduce the buffer pressure, or does it just add CPU overhead without much benefit?
Your immediate reconnect strategy is causing connection storms. Implement exponential backoff: start with 1 second delay, double it on each failed attempt up to a maximum of 60 seconds. This gives the server time to recover and prevents cascading failures. Also add jitter to prevent thundering herd when multiple clients reconnect simultaneously.