Azure Monitor storage metrics show gaps in observability for blob access patterns and latency tracking

Azure Monitor isn’t capturing detailed blob access metrics for our storage accounts. We’ve enabled diagnostic settings and configured Log Analytics workspace, but there are significant gaps in the observability data. Specifically, we’re missing per-blob access patterns, request latencies for individual operations, and caller IP information.

The diagnostic logs show aggregated metrics but don’t provide the granularity needed to troubleshoot performance issues or identify access anomalies. We’ve tried querying with KQL but the StorageBlobLogs table is incomplete:


StorageBlobLogs
| where TimeGenerated > ago(1h)
| where OperationName == "GetBlob"
| summarize count() by CallerIpAddress

This query returns sparse results even though we know there are thousands of blob reads happening. Application Insights integration isn’t helping either. What diagnostic configuration am I missing for complete storage observability?

Here’s the complete picture of Azure Storage observability and how to achieve full visibility:

Understanding Diagnostic Settings Limitations: Azure Storage diagnostic logs are intentionally sampled to balance cost and performance. The sampling rate varies:

  • Low-volume operations (<100 req/min): 100% captured
  • Medium-volume (100-1000 req/min): ~50% sampled
  • High-volume (>1000 req/min): 5-10% sampled This is by design and cannot be disabled. Microsoft doesn’t expose sampling configuration because it’s dynamic based on backend load.

Diagnostic Settings Configuration: Ensure you’ve enabled ALL log categories in diagnostic settings. The configuration should include:

{
  "logs": [
    {"category": "StorageRead", "enabled": true},
    {"category": "StorageWrite", "enabled": true},
    {"category": "StorageDelete", "enabled": true}
  ],
  "metrics": [{"category": "Transaction", "enabled": true}]
}

Log Analytics KQL Query Optimization: Your KQL query needs to account for sampling and use the correct tables. StorageBlobLogs is correct, but you should query across multiple time ranges and use aggregation functions that work with sampled data:

// Pseudocode - Key implementation steps:

  1. Query StorageBlobLogs with extended time range (4h minimum)
  2. Join with StorageMetrics for volume validation
  3. Use summarize with percentile functions for latency analysis
  4. Filter by StatusCode to identify errors vs successful operations
  5. Cross-reference CallerIpAddress with known client IPs // Account for 10-15 minute ingestion lag in time filters

Application Insights Integration: For complete observability, implement client-side telemetry:

  1. Add Azure Storage SDK telemetry to your application
  2. Configure Application Insights to capture dependency calls
  3. Use custom events for critical storage operations
  4. Enable distributed tracing to correlate app requests with storage calls

This gives you 100% visibility from the client perspective, which is often more valuable than server-side logs for troubleshooting.

Hybrid Monitoring Strategy:

  • Storage Metrics: Use for capacity planning, availability trends, and aggregate throughput (always 100% accurate)
  • Diagnostic Logs: Use for error analysis, security auditing, and sampling-acceptable use cases
  • Application Insights: Use for end-to-end request tracing, client-side latency, and critical path monitoring
  • Custom Logging: Implement for specific high-value blobs or containers where you need complete audit trails

Addressing Your Specific Gaps:

  1. Per-blob access patterns: Not available in diagnostic logs due to sampling. Solution: Implement custom Event Grid triggers on blob operations that write to a dedicated tracking table in Log Analytics or Cosmos DB.

  2. Request latencies: Available but sampled. For accurate latency percentiles, use Application Insights dependency tracking which captures client-perceived latency (more relevant than server-side anyway).

  3. Caller IP information: This is logged but heavily sampled for high-volume operations. For security monitoring, enable Storage Analytics hour metrics which provide IP-level aggregates without sampling, or use Azure Firewall/NSG flow logs if IP tracking is security-critical.

Practical Implementation: For your immediate needs, enable Storage Analytics hour and minute metrics in addition to diagnostic logs. These provide different data granularity and are not sampled. Access via Azure portal Storage Analytics blade or query the $MetricsTransactionsBlob table directly.

The reality is Azure Storage observability requires a multi-tool approach. No single configuration gives you complete visibility due to the scale and cost implications of logging billions of operations. Design your monitoring strategy based on what you actually need to troubleshoot versus nice-to-have data.

We faced this exact issue last quarter. The sampling rate for storage logs varies based on operation type and volume. High-frequency operations like GetBlob on popular blobs are heavily sampled (sometimes down to 1-5% of actual requests). The solution is multi-layered: use Storage Analytics metrics for volume trends, enable detailed logging only for specific containers you need to monitor closely, and implement custom telemetry in your application for critical paths.

For KQL queries, you also need to account for the sampling rate in your calculations. Microsoft doesn’t document the exact sampling algorithm but it’s definitely not uniform across all operations.

The RBAC permissions were correct, but that’s a good point to verify. I’m going to implement the hybrid approach with Application Insights for critical operations and accept the sampled storage logs for general trends.

Azure Storage diagnostic logs are sampled by default for high-volume operations to manage costs and log volume. There’s no way to disable this sampling completely. For complete observability, you need to implement client-side logging in your application code using Application Insights SDK. This captures every operation from the application perspective before it hits Azure Storage.

Also, ensure you’re querying the correct time range - there can be 5-10 minute ingestion delays for storage logs in Log Analytics.

Check if you have the Storage Blob Data Reader role properly assigned to your Log Analytics workspace managed identity. Without correct RBAC permissions, diagnostic logs can fail silently and you won’t see errors - just missing data. This caught us out for weeks because the diagnostic setting showed as enabled but logs weren’t flowing.

Storage diagnostic settings have multiple log categories. You need to enable all three: StorageRead, StorageWrite, and StorageDelete. By default, only high-level metrics are captured. Also check your Log Analytics workspace retention settings - logs might be getting dropped if you’re hitting quota limits.