Proactive incident management using Azure Monitor and Application Insights for ERP uptime SLAs

dorothy_ninja · November 22, 2024, 12:48pm

Sharing our implementation of proactive incident management that reduced MTTR by 67% and improved SLA compliance from 94% to 99.2%. We integrated Azure Monitor with Application Insights and ServiceNow to create an automated anomaly detection and response system.

The challenge was reactive firefighting - incidents were only detected when users reported issues. By the time our team investigated, SLA breaches had already occurred. We needed automated anomaly detection that could identify performance degradation before it impacted users, with seamless ITSM integration to route incidents to the right teams immediately.

Key components: smart detection rules in Application Insights, custom metric alerts in Azure Monitor, and Logic Apps for ServiceNow automation. The system now detects anomalies in response times, failure rates, and dependency failures, automatically creates prioritized incidents with full diagnostic context, and tracks everything against our SLA commitments.

michaelpro · December 4, 2024, 3:41pm

Great question. We implemented a three-tier filtering approach in Logic Apps before ServiceNow ticket creation. First tier: severity-based routing where only high and critical alerts create incidents immediately. Medium severity alerts aggregate over 15-minute windows. Second tier: correlation logic that groups related alerts from the same application component into single incidents. Third tier: suppression rules during deployment windows and scheduled maintenance. This reduced ticket volume by 73% while maintaining comprehensive coverage. The Logic App also enriches incidents with application topology from Azure Resource Graph and recent deployment history from Azure DevOps.

deborah_ops · November 24, 2024, 5:40pm

We rely primarily on Application Insights’ built-in smart detection for anomaly patterns. It uses adaptive machine learning that learns normal behavior over time and alerts on deviations. For custom metrics, we implemented dynamic thresholds in Azure Monitor that adjust based on historical patterns and time-of-day variations. This eliminated 80% of false positives we experienced with static thresholds. The key is tuning sensitivity settings during the learning period and excluding known maintenance windows.

barbara_lead · December 1, 2024, 9:57pm

How did you handle the ServiceNow integration? We’re looking at similar ITSM automation but concerned about alert noise creating too many tickets. Do you have any filtering or aggregation logic before incidents are created?

michaelpro · December 21, 2024, 4:51am

This is exactly what we need. Can you share more details on the availability calculation methodology and how you’re measuring MTTR improvements? Also curious about the cost implications of this monitoring setup.

Topic		Replies	Views
Automated Azure Monitor alerts for edge device fleet improved uptime by 40% in manufacturing Microsoft Azure use-case , edge-computing , observability , log-analytics , az-2019 , azure-monitor , missed-outages , uptime-improvement , kusto	3	1	July 12, 2025
Automated security policy alerts for suspicious device behavior reduced incident response by 70% Microsoft Azure IoT use-case , automation , security-policy , json , alerting , azure-monitor , incident-response , aziot-24 , security-alerts	7	0	December 4, 2025
Implemented advanced process analytics dashboards for real-time operations monitoring Appian use-case , performance-opt , process-mining , process-analytics , real-time-analytics , incident-response , sla-management , appian-22-3 , ops-monitoring	3	0	January 8, 2025
Automated alert routing to ServiceNow via integration microservice reduces manual triage time Cumulocity IoT use-case , integration , automation , rest-api , java , alerting , incident-management , servicenow , c8y-1019	4	0	October 19, 2025
Predictive maintenance alerts using real-time monitoring cut unplanned downtime by 40% in manufacturing operations Oracle IoT Cloud use-case , monitoring , integration , automation , predictive-maintenance , downtime-reduction , monitoring-dashboard , iiot-support , oiot-22	4	0	February 10, 2025
Process analytics dashboard for RPA bot monitoring improved Creatio use-case , monitoring , automation , dashboard , rpa-integration , process-analytics , incident-response , real-time-alerts , creatio-8-5	6	0	October 13, 2025
Automated device health monitoring and predictive maintenance implementation Oracle IoT Cloud use-case , analytics , automation , cost-reduction , rules-engine , predictive-maintenance , anomaly-detection , device-mgmt , oiot-pm	6	0	August 27, 2025
Process mining identified bottleneck in incident resolution ServiceNow use-case , reporting-analytics , process-mining , snow-utah , sla-improvement , performance-tuning , workflow-optimization , approval-bottleneck , incident-management	7	0	March 18, 2025
Automated defect triage from test cases cut MTTR 65% using test failure signatures Azure DevOps use-case , automation , rest-api , defect-mgmt , logic-apps , test-case-mgmt , ado-2025 , manual-triage , mttr-65pct	5	0	December 5, 2025

Proactive incident management using Azure Monitor and Application Insights for ERP uptime SLAs

Related topics