Azure Blob Storage monitoring alerts not triggering for large file uploads in production workloads

john_ops · July 15, 2025, 5:24pm

We have Azure Monitor alerts configured to notify us when files larger than 500MB are uploaded to our production Blob Storage container. The alerts worked fine initially, but we’ve noticed they’re not triggering consistently anymore-especially during high-volume periods when multiple large files arrive simultaneously.

We’re using BlobCreated events with Event Grid subscriptions to trigger the alerts, but I’m starting to wonder if we’re hitting some kind of subscription limit or if there’s an issue with our alert rule configuration. The diagnostic logs show the uploads are completing successfully, so the files are definitely reaching the container.

This is causing us to miss critical ingestion events that downstream processes depend on. Has anyone experienced similar issues with Event Grid subscriptions or Azure Monitor alerts not firing reliably for Blob Storage events?

carolarchitect · August 9, 2025, 4:19pm

Good point about the retry policy. Our webhook endpoint does take 3-4 seconds to respond sometimes. Could that be causing the backlog?

pameladev · August 10, 2025, 2:41pm

Based on the symptoms you’re describing, you have three interconnected issues that need to be addressed systematically:

1. BlobCreated Event Diagnostics: First, enable detailed diagnostic logging on your storage account. Navigate to your storage account → Diagnostic settings → Add diagnostic setting. Enable ‘StorageRead’ and ‘StorageWrite’ logs and send them to Log Analytics. This will help you verify that BlobCreated events are actually being generated for all uploads. Query the logs with:


StorageBlobLogs
| where OperationName == "PutBlob" and StatusText == "Success"
| where ContentLength > 524288000

This confirms whether the issue is event generation or event delivery.

2. Event Grid Subscription Limits: You’re hitting Event Grid’s publishing rate limit during peak hours (the 150-200 dropped events confirm this). The standard tier supports up to 5,000 events per second per topic. To solve this:

Implement event batching by configuring your Event Grid subscription to use batch delivery (max 5000 events per batch)
Consider splitting your workload across multiple Event Grid topics by container or file size category
Increase the ‘max events per batch’ and ‘preferred batch size in kilobytes’ settings in your subscription configuration
Enable dead-lettering to a separate storage container so you can recover dropped events

3. Alert Rule Configuration: Your alert rule needs optimization for high-volume scenarios:

Change the aggregation granularity to 5 minutes minimum (not 1 minute)
Use ‘Total’ aggregation instead of ‘Count’ for the metric evaluation
Set the evaluation frequency to match your aggregation window to prevent evaluation gaps
Verify your action group isn’t being throttled by checking Action Group metrics in Azure Monitor
Consider using Log Analytics-based alerts instead of metric alerts for better handling of high-cardinality data

The 3-4 second webhook response time is definitely contributing to the problem. Event Grid expects acknowledgment within 60 seconds, but slow responses cause connection pooling issues. Implement an asynchronous processing pattern where your webhook immediately returns 200 OK and processes the event in the background.

Implement these changes in order-diagnostics first to confirm event generation, then optimize Event Grid delivery, and finally tune the alert rules. Monitor for 48 hours after each change to measure improvement.

jasonadmin · August 5, 2025, 5:34pm

We had a similar problem and discovered our issue was with Event Grid subscription quotas. Each subscription has a default limit of 500 event subscriptions per region. If you’re creating subscriptions dynamically or have a lot of them, you might be hitting that limit. Also check the event delivery retry policy-by default Event Grid retries for 24 hours, but if your endpoint is slow to respond, events can queue up and eventually drop.

michaelpro · July 18, 2025, 3:19pm

Thanks Mike. I checked the Event Grid metrics and we’re definitely seeing dropped events during peak hours-around 150-200 drops per hour. Our subscription doesn’t have any advanced filters applied, just the basic BlobCreated event type. Should we be implementing batching or splitting across multiple topics?

Topic		Views
Azure Monitor storage metrics show gaps in observability for blob access patterns and latency tracking Microsoft Azure question , storage , observability , log-analytics , az-2020 , json , azure-monitor , diagnostic-settings , kql-queries	6	January 19, 2025
Log Analytics workspace storage capacity alerts not triggering despite exceeding quota limits Microsoft Azure question , monitoring , alerts , observability , log-analytics , az-2019 , storage-quota , action-group , workspace-usage	5	December 24, 2024
Log Analytics workspace storage capacity alerts not triggering despite exceeding quota Microsoft Azure question , monitoring , alerts , observability , log-analytics , az-2019 , data-loss-risk , workspace-usage , action-groups	6	October 21, 2025
Azure Log Analytics query latency spikes during high-volume data ingestion Microsoft Azure question , monitoring , networking , observability , az-2021 , performance-tuning , latency , azure-log-analytics , kusto-query	6	February 4, 2025
Monitoring data ingestion shows lag after upgrading to aziot-24, causing delayed alerts Microsoft Azure IoT question , monitoring , json , latency , alerting , batch-optimization , event-hubs , data-ingestion , aziot-24	6	September 12, 2025
Azure Log Analytics query latency spikes during high-volume ingestion Microsoft Azure question , monitoring , networking , query-optimization , observability , az-2021 , latency , kusto , azure-log-analytics	6	August 31, 2025
File Storage backup monitoring alerts not triggering in IBM Cloud Monitoring after backup job failures IBM Cloud question , storage , observability , ic-2020 , alert-missing , ibm-cloud-monitoring , storage-backup-monitoring , missed-backup-failure , metrics-collection	5	January 27, 2025
Blob storage ML data ingestion fails for large files with timeout and partial read errors Microsoft Azure question , data-integrity , storage , timeout , az-2021 , machine-learning , blob-storage , large-files , retry-policy	5	January 13, 2025
Azure Blob Storage analytics queries timeout when analyzing large datasets with index tags Microsoft Azure question , storage , analytics , query-timeout , az-2019 , azure-storage-analytics , blob-index-tags , synapse-integration , kql	3	March 20, 2025

Azure Blob Storage monitoring alerts not triggering for large file uploads in production workloads

Related topics