MQTT bridge integration with Azure IoT Hub disconnects intermittently during high message throughput

We’re running IBM Watson IoT Platform 25.x with an MQTT bridge to Azure IoT Hub for hybrid cloud telemetry. The bridge works initially but disconnects after 2-4 hours of operation, causing message loss and delayed delivery.

Our current configuration:


mqtt.bridge.keepalive=60
mqtt.bridge.reconnect.delay=5000
telemetry.batch.size=100

We’re seeing connection drops during high-volume periods (500+ devices sending data every 30 seconds). I suspect it’s related to MQTT keepalive settings or Azure IoT Hub throttling limits, but haven’t found the root cause. We’ve tried batching telemetry messages to reduce load, but disconnects still occur. Any insights on tuning these parameters or alternative approaches?

One thing that helped us was implementing device-side message aggregation before sending to Watson IoT. Instead of 500 devices sending every 30 seconds, we aggregated readings locally on edge gateways and sent every 2-3 minutes. This reduced our message volume by 75% while maintaining acceptable latency for our use case. Not ideal for real-time scenarios, but works well for monitoring and analytics workloads.

Thanks for the suggestions. I checked our Azure IoT Hub tier - we’re on S1 which maxes at 400,000 messages/day. We’re definitely hitting throttling during peak hours. I’ll work on upgrading to S2, but in the meantime, are there Watson IoT Platform settings to handle backpressure more gracefully when Azure throttles? Currently the bridge just drops the connection.

The keepalive=60 combined with high volume is definitely problematic. When the bridge is busy sending batched messages, it might miss keepalive responses. Try increasing keepalive to 180-240 seconds and enable TCP keepalive at the OS level as a backup. Also, your batch size of 100 seems high - Azure IoT Hub prefers smaller, more frequent batches. Try batch.size=20-30 with a shorter batch.interval to smooth out the traffic pattern.