Monitoring alerts not triggering when IoT device goes offline despite correct Pub/Sub subscription setup

nicoleexpert · October 16, 2025, 11:19am

We’ve configured Cloud Monitoring alert policies to notify us when devices disconnect from our IoT Core registry, but alerts aren’t firing consistently. Our fleet has 500+ industrial sensors sending telemetry every 30 seconds. When devices go offline (power loss, network issues), we expect immediate alerts but often discover outages hours later through manual checks.

Our current setup uses the device connection state metric with a threshold condition. The Pub/Sub topic for telemetry data is linked to the registry, and we have a subscription processing messages. However, the alert policy doesn’t seem to detect when connection state changes from CONNECTED to DISCONNECTED. We’ve verified the metric exists in Cloud Monitoring but the policy remains silent during actual device failures.

Is there a specific way to configure alert policies for IoT device connection states? Are we missing linkage between the Pub/Sub subscription and the monitoring metric?

andrew_coder · October 24, 2025, 10:12pm

That’s interesting about absence-based alerts. So instead of monitoring the connection state metric directly, I should monitor the Pub/Sub message flow? How would I configure that in Cloud Monitoring?

andrew_coder · October 17, 2025, 8:19pm

The connection state metric alone might not be sufficient. IoT Core’s device connection state is tracked differently than regular telemetry metrics. You need to ensure you’re monitoring the right signal - typically iot.googleapis.com/device/connected_state but this metric has a sampling interval that might not catch brief disconnections. Have you checked if your alert policy is using the correct metric name and aggregation window?

michael_wizard · November 16, 2025, 7:51am

Adding to the excellent suggestions above - your alert policy configuration needs to address all three components systematically:

Cloud Monitoring Alert Policy Configuration: First, create a metric-based alert policy specifically for device connectivity. Use pubsub.googleapis.com/subscription/num_unacked_messages_by_region as your primary metric since this reflects actual message flow. Set the aggregation window to 1 minute with an aligner of ALIGN_RATE to detect when message rates drop to zero for any device.

Pub/Sub Topic and Subscription Linkage: Your IoT Core device registry’s telemetry topic must have an active subscription with proper configuration:

Set ackDeadlineSeconds to 60 seconds minimum
Enable retainAckedMessages for 10 minutes to allow alert policy evaluation
Use message filtering on the subscription level with attributes like deviceId to enable per-device monitoring
Create a separate subscription dedicated to monitoring (don’t rely on your processing subscription)

Device Connection State Metrics: While IoT Core does expose connection state, it’s not real-time. Instead, implement a custom metric approach:

Create a log-based metric from Cloud Logging that captures device authentication events
Filter for resource.type="cloudiot_device" and `protoPayload.methodName=“google.cloud.iot.v1.DeviceManager.SendCommandToDevice”
Extract device ID as a label: `labels.device_id = EXTRACT(resource.labels.device_id)
Set up an alert policy on this custom metric with a threshold condition: if metric value = 0 for > 2 minutes, trigger alert

For your 500+ device fleet, consider implementing a Cloud Function that periodically (every 5 minutes) queries the device registry for connection states and publishes custom metrics to Cloud Monitoring. This gives you more control over alert timing and can include device-specific metadata in alert notifications.

The key issue with your current setup is likely that you’re monitoring a lagging indicator (connection state) rather than the actual data flow (Pub/Sub messages). Switch to message-based monitoring and you’ll get alerts within 2-3 minutes of device disconnection rather than hours later.

Topic		Views
Monitoring alerts not triggering on device disconnects - missed downtime alerts Microsoft Azure IoT question , monitoring , iot-hub , azure-monitor , monitoring-alerts , event-grid , device-disconnect , device-mgmt , aziot-24	4	October 19, 2025
Monitoring alerts not triggered for Pub/Sub dead-letter events in event-processing pipeline Google Cloud IoT question , monitoring , cloud-monitoring , event-processin , pubsub-23 , alert-misconfig , dead-letter-topic , subscription-metrics	3	October 10, 2025
Device offline status not updating in gateway management after Pub/Sub migration Google Cloud IoT question , pubsub , connectivity , json , cloud-storage , status-update , iam-permissions , gateway-mgmt , pubsub-23	6	January 5, 2025
Device registry alert not firing for unauthorized device registration attempts in multi-region setup Google Cloud IoT question , security , yaml , alerting , unauthorized-access , log-based-metrics , cloud-audit-logs , device-registry , gcpiot-24	4	December 2, 2025
Device shadow alerting for offline status improved uptime monitoring by 40% Microsoft Azure IoT use-case , json , alerting , azure-monitor , device-shadow , device-twin , aziotc , offline-alerts , uptime-monitoring	7	June 3, 2025
IoT data stream disconnects frequently due to TCP timeouts in Cloud Functions Google Cloud IoT question , connectivity , data-loss , cloud-functions , data-streaming , keepalive , iot-core , gcpiot-24 , tcp-timeout	5	August 21, 2025
CloudWatch metrics delayed for IoT Core monitoring during high device connection bursts AWS IoT question , monitoring , performance-opt , real-time-monitoring , cloudwatch , metrics-delay , awsiot-25 , iot-core	6	November 15, 2025
How do you monitor IoT data ingestion latency and set up effective alerting policies? Google Cloud IoT discussion , performance-opt , observability , analytics-report , alerting , cloud-monitoring , data-ingestion , pubsub-23 , latency-monitoring	6	September 28, 2025
Device shadow state updates delayed in Google Cloud IoT Core with Pub/Sub integration for real-time monitoring Google Cloud IoT question , perception , sync-lag , pub-sub , mqtt , device-shado , delayed-alerts , gcpiot-25 , qos-config	3	March 23, 2025

Monitoring alerts not triggering when IoT device goes offline despite correct Pub/Sub subscription setup

Related topics