Here’s the complete picture of Azure Storage observability and how to achieve full visibility:
Understanding Diagnostic Settings Limitations:
Azure Storage diagnostic logs are intentionally sampled to balance cost and performance. The sampling rate varies:
- Low-volume operations (<100 req/min): 100% captured
- Medium-volume (100-1000 req/min): ~50% sampled
- High-volume (>1000 req/min): 5-10% sampled
This is by design and cannot be disabled. Microsoft doesn’t expose sampling configuration because it’s dynamic based on backend load.
Diagnostic Settings Configuration:
Ensure you’ve enabled ALL log categories in diagnostic settings. The configuration should include:
{
"logs": [
{"category": "StorageRead", "enabled": true},
{"category": "StorageWrite", "enabled": true},
{"category": "StorageDelete", "enabled": true}
],
"metrics": [{"category": "Transaction", "enabled": true}]
}
Log Analytics KQL Query Optimization:
Your KQL query needs to account for sampling and use the correct tables. StorageBlobLogs is correct, but you should query across multiple time ranges and use aggregation functions that work with sampled data:
// Pseudocode - Key implementation steps:
- Query StorageBlobLogs with extended time range (4h minimum)
- Join with StorageMetrics for volume validation
- Use summarize with percentile functions for latency analysis
- Filter by StatusCode to identify errors vs successful operations
- Cross-reference CallerIpAddress with known client IPs
// Account for 10-15 minute ingestion lag in time filters
Application Insights Integration:
For complete observability, implement client-side telemetry:
- Add Azure Storage SDK telemetry to your application
- Configure Application Insights to capture dependency calls
- Use custom events for critical storage operations
- Enable distributed tracing to correlate app requests with storage calls
This gives you 100% visibility from the client perspective, which is often more valuable than server-side logs for troubleshooting.
Hybrid Monitoring Strategy:
- Storage Metrics: Use for capacity planning, availability trends, and aggregate throughput (always 100% accurate)
- Diagnostic Logs: Use for error analysis, security auditing, and sampling-acceptable use cases
- Application Insights: Use for end-to-end request tracing, client-side latency, and critical path monitoring
- Custom Logging: Implement for specific high-value blobs or containers where you need complete audit trails
Addressing Your Specific Gaps:
-
Per-blob access patterns: Not available in diagnostic logs due to sampling. Solution: Implement custom Event Grid triggers on blob operations that write to a dedicated tracking table in Log Analytics or Cosmos DB.
-
Request latencies: Available but sampled. For accurate latency percentiles, use Application Insights dependency tracking which captures client-perceived latency (more relevant than server-side anyway).
-
Caller IP information: This is logged but heavily sampled for high-volume operations. For security monitoring, enable Storage Analytics hour metrics which provide IP-level aggregates without sampling, or use Azure Firewall/NSG flow logs if IP tracking is security-critical.
Practical Implementation:
For your immediate needs, enable Storage Analytics hour and minute metrics in addition to diagnostic logs. These provide different data granularity and are not sampled. Access via Azure portal Storage Analytics blade or query the $MetricsTransactionsBlob table directly.
The reality is Azure Storage observability requires a multi-tool approach. No single configuration gives you complete visibility due to the scale and cost implications of logging billions of operations. Design your monitoring strategy based on what you actually need to troubleshoot versus nice-to-have data.