Data stream module drops messages during brief network outages on Azure IoT Edge, causing data loss

During short network interruptions (30-60 seconds), our aziot-25 data stream module is dropping telemetry messages instead of buffering them for later transmission. Once connectivity resumes, we see a gap in the message timeline at Azure IoT Hub.

I’ve verified the buffering configuration in the module settings, and there should be sufficient local storage for messages. The persistent storage path exists and has adequate disk space. The module logs show “message buffer full” warnings during outages, but our calculated message rate shouldn’t fill the buffer in under a minute.

Is there a recommended approach for handling network outage scenarios in the data stream module? Should we increase buffer size, or is there a better strategy for message persistence during connectivity gaps?

I checked the storage path permissions - they’re correct (iotedge user has write access). I wasn’t aware the edgeHub module has its own store-and-forward capability. Should the data stream module be routing through edgeHub instead of direct IoT Hub connections?

With that message rate, you need at least 100MB buffer to handle 60-second outages comfortably. But also check if persistent storage is actually being used - the module might be using in-memory buffering only. Verify the storage path is mounted correctly and the module has write permissions.

You need a multi-layered approach to handle network outages properly:

Buffering Configuration: First, right-size your buffer based on actual message volume. With 200 msg/s at 2KB each:

{
  "properties": {
    "desired": {
      "bufferSizeInMB": 250,
      "messageTTL": 7200,
      "maxRetryAttempts": 5,
      "retryIntervalSeconds": 10,
      "persistentStoragePath": "/var/lib/iotedge/storage/datastream"
    }
  }
}

This gives you ~10 minutes of buffering capacity (250MB / 0.4MB per second). The extended TTL ensures messages don’t expire during longer outages.

Persistent Storage for Messages: Ensure the data stream module uses disk-backed storage, not just memory:

{
  "HostConfig": {
    "Binds": [
      "/var/lib/iotedge/storage/datastream:/app/storage"
    ]
  },
  "Env": [
    "STORAGE_MODE=persistent",
    "STORAGE_PATH=/app/storage"
  ]
}

Verify storage is working:

ls -lah /var/lib/iotedge/storage/datastream/
# Should show .db files during buffering

Network Outage Handling: Integrate with edgeHub’s store-and-forward for better resilience. Route messages through edgeHub instead of direct IoT Hub connections:

{
  "routes": {
    "dataStreamToHub": "FROM /messages/modules/datastream/* INTO $upstream"
  }
}

Configure edgeHub’s store-and-forward:

{
  "properties.desired": {
    "storeAndForwardConfiguration": {
      "timeToLiveSecs": 7200
    },
    "schemaVersion": "1.2",
    "routes": {
      "dataStreamToHub": "FROM /messages/modules/datastream/* INTO $upstream"
    }
  }
}

Modify your data stream module to send messages to edgeHub:

from azure.iot.device import IoTHubModuleClient

client = IoTHubModuleClient.create_from_edge_environment()

# Send to edgeHub instead of direct to IoT Hub
await client.send_message_to_output(
    message,
    output_name="telemetryOutput"
)

This leverages edgeHub’s built-in buffering and retry logic, which is more robust than custom module buffering.

Additional optimizations:

  1. Message Compression: Reduce storage requirements:
import gzip
import json

data = json.dumps(telemetry_data)
compressed = gzip.compress(data.encode())
message = Message(compressed)
message.custom_properties["compressed"] = "true"
  1. Priority Queuing: Implement message priorities:
if message.priority == "high":
    buffer.high_priority_queue.append(message)
else:
    buffer.normal_queue.append(message)
  1. Storage Monitoring: Add alerts for buffer usage:
buffer_usage = get_buffer_size() / max_buffer_size
if buffer_usage > 0.8:
    send_alert("Buffer 80% full - network issues?")
  1. Graceful Degradation: During extended outages, implement sampling:
if buffer_usage > 0.9:
    # Sample every 10th message instead of all
    if message_count % 10 == 0:
        buffer.append(message)

Monitor the solution:

# Watch buffer usage
watch -n 5 'du -sh /var/lib/iotedge/storage/datastream/'

# Monitor edgeHub queue
iotedge logs edgeHub --tail 50 | grep -i queue

The root cause is insufficient buffer capacity combined with direct IoT Hub connections that bypass edgeHub’s store-and-forward mechanism. By routing through edgeHub and properly configuring both module and edgeHub buffering, you’ll achieve reliable message delivery during network interruptions.

The “buffer full” warning suggests your buffer size is too small for your message rate. What’s your current buffer configuration? Default is usually 10MB which fills quickly with high-frequency telemetry. Check your module twin’s bufferSizeInMB property and message TTL settings.

Current config shows:

{
  "bufferSizeInMB": 10,
  "messageTTL": 3600,
  "persistentStoragePath": "/var/lib/iotedge/storage"
}

We generate about 200 messages/second at 2KB each. That’s 400KB/s, so 10MB fills in 25 seconds. I need to increase the buffer significantly.