Data storage SDK query performance issues cause slow responses in aziot-25

We’re experiencing severe query performance issues with the data storage SDK in aziot-25. Queries against our telemetry data table (50M records) take 45-90 seconds to return results, causing unacceptable delays in our analytics dashboards. Simple time-range queries with device ID filters are timing out or returning extremely slowly.

Typical query pattern:

SELECT * FROM telemetry
WHERE deviceId = 'sensor_1234'
AND timestamp BETWEEN '2025-07-01' AND '2025-07-14'

The query returns ~50K records but takes over a minute. We’ve verified the underlying database has sufficient resources. Query optimization documentation mentions indexing and pagination but doesn’t provide clear guidance on proper implementation. How should we optimize these queries for acceptable performance?

You’re doing a SELECT * which is retrieving all columns for 50K records - that’s a massive data transfer. First, select only the columns you actually need. Second, you absolutely need indexes on deviceId and timestamp columns. Check if those indexes exist. Third, returning 50K records in one query is inefficient - implement pagination with LIMIT and OFFSET to fetch data in chunks of 1000-5000 records per request.

The data storage SDK in aziot-25 has query optimization features you should enable. Use the query builder API instead of raw SQL - it automatically applies best practices like column selection and pagination. Also enable query result caching for frequently accessed data. We saw 10x performance improvement by switching from raw SQL to the SDK’s query builder with proper indexing.

Yes, paginate for backend processing too. Processing 50K records in memory at once is inefficient and risky. Use cursor-based pagination or keyset pagination rather than OFFSET-based for better performance. For index verification, query the database metadata tables or use EXPLAIN on your query to see if indexes are being used. The aziot-25 storage SDK has built-in pagination support via the query options parameter.

Your query performance issues require systematic optimization across all three areas:

Query Optimization: Avoid SELECT * and specify only required columns. Use the SDK’s query builder for automatic optimization:

var query = storageClient.CreateQuery()
    .Select("deviceId", "timestamp", "temperature", "humidity")
    .Where("deviceId", deviceId)
    .WhereBetween("timestamp", startDate, endDate)
    .OrderBy("timestamp")
    .Limit(5000);

Indexing: Create composite indexes optimized for your query patterns. For time-range queries with device filtering:

CREATE INDEX idx_device_time
ON telemetry(deviceId, timestamp DESC)
INCLUDE (temperature, humidity);

Pagination: Implement cursor-based pagination for efficient large result set handling:

string continuationToken = null;
do {
    var result = await query.ExecuteAsync(continuationToken);
    ProcessBatch(result.Items);
    continuationToken = result.ContinuationToken;
} while (continuationToken != null);

Detailed implementation strategy: First, verify index existence using the storage SDK’s metadata API or database EXPLAIN plans. Your query should show “Index Seek” on idx_device_time, not “Table Scan”. If indexes are missing, create composite index on (deviceId, timestamp) with included columns for frequently accessed fields. This eliminates the need for key lookups after index seek. Second, implement pagination with 5000 record chunks - this reduces memory pressure and enables progressive result processing. Use continuation tokens instead of OFFSET-based pagination to avoid performance degradation on later pages. Third, optimize column selection - if you need 10 columns out of 50, explicitly list them to reduce data transfer by 80%. Fourth, enable query result caching in the SDK for repeated queries:

var options = new QueryOptions {
    EnableCache = true,
    CacheDuration = TimeSpan.FromMinutes(5)
};

For 50M record tables, consider time-based partitioning (monthly or weekly) to improve query performance. Partition pruning eliminates 90%+ of data from scans when querying recent time ranges. Also implement query timeout handling and retry logic for long-running queries. Monitor query execution metrics via the SDK’s telemetry: track execution time, rows scanned vs returned, and cache hit rates. For your specific query pattern, properly indexed and paginated queries should complete in under 5 seconds for the first page and 2-3 seconds for subsequent pages. If performance doesn’t improve after indexing, check if statistics are updated (run ANALYZE on the table) and verify query planner is choosing optimal execution plans. Consider read replicas if you have high concurrent query load impacting write performance.

We do need most columns for the analytics processing. Are you suggesting we should paginate even for backend processing, not just UI display? Also, how do we verify if proper indexes exist on the telemetry table?

Check your query execution plan. If the database is doing full table scans instead of index seeks, that’s your problem. Create a composite index on (deviceId, timestamp) for optimal query performance. Also consider partitioning your telemetry table by date if you’re not already - this dramatically improves time-range query performance by eliminating irrelevant partitions from the scan.