Azure Log Analytics query latency spikes during high-volume ingestion

dorothy_master · August 6, 2025, 1:50pm

We’re experiencing severe query latency spikes in our Azure Log Analytics workspace during peak data ingestion periods. Our environment processes approximately 500GB daily from multiple sources including Application Insights, VMs, and container logs.

The issue manifests as query timeouts (>180s) when running standard KQL queries that normally complete in 15-20 seconds. This happens specifically during our ETL windows between 02:00-06:00 UTC when ingestion rates spike to 80GB/hour.

Current workspace configuration:


workspace.sku = PerGB2018
retention = 90 days
daily cap = not set
ingestion volume = 500GB/day avg

We’ve noticed the latency correlates with high data ingestion rates, but I’m unsure if this is a workspace scaling issue, query optimization problem, or network bandwidth limitation. Has anyone dealt with similar query performance degradation during high-volume ingestion periods?

jonathantech · August 9, 2025, 1:45pm

I’ve seen this pattern before. During high ingestion, Log Analytics prioritizes write operations over read queries. Your 500GB daily volume is substantial but manageable. First quick check - are you using summarize operations without proper time filters? That forces full table scans even during ingestion pressure.

angela_sql · August 10, 2025, 4:04pm

Good point about time filters. Most of our queries do include specific time ranges (last 24h or 7d), but we have a few dashboard queries that might be scanning larger datasets. I’ll audit those. However, even our well-filtered queries are timing out during the ingestion window. Could workspace capacity be a factor here?

john_coder · August 19, 2025, 4:43am

Your ingestion pattern suggests you might benefit from distributed ingestion. Are all 80GB/hour hitting the workspace simultaneously? We spread our high-volume sources across multiple collection endpoints and that helped reduce contention. Also check if you’re hitting any throttling limits - Azure Monitor has ingestion rate limits per workspace that might be causing backpressure.

amy_admin · September 11, 2025, 1:40am

I’ll break down the three areas you need to address simultaneously:

Query Optimization: Your KQL queries need immediate attention. Add explicit time ranges to ALL queries using where TimeGenerated > ago(24h) at the beginning of your query chain, not buried in later operations. Use summarize with bin() for time-series data to reduce result sets. For your dashboard queries, implement query result caching by standardizing time windows. Replace any search operators with explicit table references - search scans all tables and is extremely expensive during ingestion.

Data Ingestion Strategy: Your 80GB/hour spike is creating resource contention. Implement these changes:

Distribute ingestion across time by staggering your ETL jobs into 30-minute batches instead of parallel execution
Enable data collection rules (DCR) to filter and transform data at ingestion time, reducing workspace load
Set up separate workspaces for different data tiers: hot operational data (30 days) vs cold analytical data (90 days)
Configure ingestion-time transformations to drop unnecessary fields and reduce volume by 20-30%

Workspace Scaling: At 45TB total with 500GB daily ingestion, you’re definitely a candidate for Azure Monitor dedicated clusters (minimum 500GB/day commitment). Benefits include:

Reserved capacity that isolates your ingestion from multi-tenant throttling
Customer-managed keys for encryption
Cross-workspace queries with better performance
Ability to link multiple workspaces to the cluster

Immediate action: Enable Basic Logs for high-volume, low-value data sources (like verbose application logs). This reduces costs by 65% and removes these logs from competing with your critical analytics queries. Your container logs are perfect candidates.

For monitoring: Set up alerts on _LogOperation | where Level == "Warning" to catch ingestion throttling early. Track query performance with _QueryPerformance table to identify problematic queries.

Implement query optimization first (immediate impact), then ingestion distribution (1-week improvement), finally evaluate dedicated cluster based on 2-week performance metrics. Your current PerGB2018 tier can handle the volume but not the concurrent ingestion/query load pattern you’re experiencing.

sandraexpert · August 26, 2025, 3:15am

Have you looked at your query patterns during these spikes? We had similar issues and discovered some of our queries were doing cross-workspace joins during peak ingestion. That combination was killer for performance. Moving those queries to off-peak hours or using materialized views helped significantly. Also, your 90-day retention on 500GB daily means you’re managing roughly 45TB of data - that’s definitely workspace scaling territory.

angela_sql · August 31, 2025, 3:37pm

Thanks everyone. I’ve identified several queries doing unnecessary full scans and we do have some cross-workspace operations. The 45TB total volume is eye-opening - I hadn’t calculated that. What’s the recommended approach for workspace scaling at this volume? Should we be looking at dedicated clusters?

Topic		Views
Azure Log Analytics query latency spikes during high-volume data ingestion Microsoft Azure question , monitoring , networking , observability , az-2021 , performance-tuning , latency , azure-log-analytics , kusto-query	6	February 4, 2025
Log Analytics workspace storage capacity alerts not triggering despite exceeding quota limits Microsoft Azure question , monitoring , alerts , observability , log-analytics , az-2019 , storage-quota , action-group , workspace-usage	5	December 24, 2024
Log Analytics workspace storage capacity alerts not triggering despite exceeding quota Microsoft Azure question , monitoring , alerts , observability , log-analytics , az-2019 , data-loss-risk , workspace-usage , action-groups	6	October 21, 2025
Azure Blob Storage analytics queries timeout when analyzing large datasets with index tags Microsoft Azure question , storage , analytics , query-timeout , az-2019 , azure-storage-analytics , blob-index-tags , synapse-integration , kql	3	March 20, 2025
Azure Monitor storage metrics show gaps in observability for blob access patterns and latency tracking Microsoft Azure question , storage , observability , log-analytics , az-2020 , json , azure-monitor , diagnostic-settings , kql-queries	6	January 19, 2025
Network Watcher flow logs delayed in Log Analytics workspace after VNet peering changes Microsoft Azure question , networking , log-analytics , az-2021 , monitoring-mana , nsg , vnet , network-watcher , flowlogs-delay	4	June 28, 2025
CloudWatch Logs Insights API batch query limits and performance tuning for high-volume log analytics Amazon Web Services (AWS) discussion , timeout , observability , batch-processing , aws-2020 , rate-limit , apis , analytics-delay , cloudwatch-logs	5	May 13, 2025
Data storage SDK query performance issues cause slow responses in aziot-25 Microsoft Azure IoT question , api-development , indexing , query-performance , pagination , analytics-delay , data-storage , aziot-25 , data-storage-sdk	6	August 1, 2025
Optimizing Azure Analytics data lake network performance for ETL Microsoft Azure discussion , monitoring , networking , analytics , etl , performance , az-2020 , azure-data-lake , expressroute	5	August 16, 2025

Azure Log Analytics query latency spikes during high-volume ingestion

Related topics