Monitoring SDK alert rule not triggering on threshold breach in aziot-24 monitoring module

Critical incident response delays are occurring because alert rules aren’t firing. We’ve configured threshold-based alerts via the Azure IoT SDK (aziot-24) to trigger when device temperature exceeds 85°C, but notifications aren’t being sent even when telemetry clearly shows breaches.

Alert rule configuration:

alertRule.condition = {
  metric: 'temperature',
  threshold: 85,
  operator: 'greaterThan'
};

Telemetry logs show multiple devices reporting 92-95°C for extended periods, but no alerts triggered. The telemetry mapping between our device schema and the monitoring system seems correct. We upgraded from aziot-23 to aziot-24 last month - could the SDK version update have changed alert rule behavior?

The aggregation setting is in the condition object. You need to add aggregation: 'Maximum' as a property alongside metric and threshold. Also verify your telemetry field name exactly matches - aziot-24 is case-sensitive now. If your device sends ‘Temperature’ but your rule checks ‘temperature’, it won’t match.

Also check if your alert rule is actually enabled. After SDK upgrades, sometimes rules get disabled by default and need to be explicitly re-enabled. Use the SDK’s listAlertRules() method to verify the enabled status of all your rules.

There’s also a telemetry aggregation change in aziot-24. Alert rules now evaluate against the average value within the evaluation window, not the raw telemetry points. If your 95°C spike lasts only 30 seconds within a 5-minute window, the average might still be below 85°C. Switch to ‘maximum’ aggregation instead of ‘average’ for threshold alerts on spikes.

Your alert rule issues stem from multiple breaking changes in aziot-24. Here’s the complete solution covering all focus areas:

1. Alert Rule Configuration (Core Fix):

Update your alert rule with all required aziot-24 properties:

const alertRule = {
  name: 'HighTemperatureAlert',
  enabled: true,
  condition: {
    metric: 'temperature',
    threshold: 85,
    operator: 'greaterThan',
    aggregation: 'Maximum',
    unit: 'Celsius'
  },
  evaluationFrequency: 60,
  windowSize: 300,
  severity: 'Critical'
};

Key changes from aziot-23:

  • aggregation is now mandatory (default changed from ‘Maximum’ to ‘Average’)
  • unit must be specified for numeric metrics
  • evaluationFrequency default increased from 60s to 300s
  • enabled must be explicitly set (no longer defaults to true)

2. Telemetry Mapping (Schema Alignment):

Aziot-24 introduced strict schema validation. Ensure your device telemetry matches the alert rule metric name exactly:

// Device telemetry must use exact field names
const telemetry = {
  temperature: 92.5,  // lowercase to match alert rule
  temperatureUnit: 'Celsius',
  timestamp: Date.now()
};

The SDK now performs case-sensitive matching. If your devices send ‘Temperature’ (capitalized) but your rule checks ‘temperature’, alerts won’t trigger. Update either your device code or alert rule to match.

Telemetry Mapping Validation: Query the metric metadata to verify correct mapping:

const metrics = await iotClient.getAvailableMetrics(deviceId);
console.log('Available metrics:', metrics);
// Verify 'temperature' appears in the list

3. SDK Version Update (Migration Steps):

Aziot-24 changed alert rule persistence. Existing rules from aziot-23 need migration:

  1. Export existing rules before upgrade
  2. After upgrade, rules are disabled by default
  3. Re-create rules with new schema
  4. Test each rule individually
// Migration script
const oldRules = await client.listAlertRules();
for (const rule of oldRules) {
  const updated = {
    ...rule,
    condition: {
      ...rule.condition,
      aggregation: 'Maximum',
      unit: inferUnit(rule.condition.metric)
    },
    enabled: true
  };
  await client.updateAlertRule(rule.id, updated);
}

Additional Configuration Best Practices:

  • Window size: Set to 5x evaluation frequency (300s window with 60s evaluation)
  • Consecutive breaches: Add consecutiveBreaches: 2 to reduce false positives
  • Alert actions: Configure notification channels explicitly (email/webhook)
  • Metric units: Standardize on Celsius/Fahrenheit across all devices

Testing and Validation:

  1. Use SDK debug mode to see alert evaluation logs
  2. Manually trigger test alerts with simulated telemetry
  3. Verify alert history shows evaluation attempts
  4. Monitor alert rule metrics dashboard for evaluation count

Performance Impact: Evaluation frequency of 60s with Maximum aggregation increases compute load by ~20% compared to 300s/Average. Monitor your IoT Hub throttling metrics. If you hit limits, consider:

  • Increasing evaluation frequency to 120s for non-critical alerts
  • Using Average aggregation for gradual threshold breaches
  • Implementing device-side pre-filtering for extreme values

With these changes, your alert rules will trigger correctly on threshold breaches. The combination of Maximum aggregation and 60-second evaluation frequency ensures you catch brief temperature spikes that would be missed with default settings.

Check your alert rule evaluation frequency. In aziot-24, the default changed from 1 minute to 5 minutes. If your temperature spikes are brief, they might not be captured during evaluation windows. You need to explicitly set evaluationFrequency to 60 seconds in your rule configuration.