Here’s a comprehensive optimization strategy addressing all three focus areas:
BigQuery Query Optimization:
First, recreate your table with proper partitioning and clustering:
CREATE TABLE `project.dataset.device_telemetry_optimized
PARTITION BY DATE(timestamp)
CLUSTER BY device_id, location
AS SELECT * FROM `project.dataset.device_telemetry
Rewrite your query to leverage partitioning and avoid grouping by timestamp:
SELECT device_id, AVG(temperature) as avg_temp, DATE(timestamp) as date
FROM `project.dataset.device_telemetry_optimized
WHERE DATE(timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
GROUP BY device_id, date
This reduces data scanned by 95%+ and eliminates row explosion from timestamp grouping.
Table Partitioning:
Partition by DAY (not HOUR) to balance query performance and metadata overhead. With 8,000 devices sending data every 5 minutes, you’re generating ~2.3M rows per day. Daily partitions keep partition size manageable while enabling efficient pruning.
Set partition expiration to auto-delete old data:
ALTER TABLE `project.dataset.device_telemetry_optimized
SET OPTIONS (partition_expiration_days=365)
Cluster by device_id and secondary dimension (location/device_type) to optimize filtering and aggregation.
Dashboard Caching Strategies:
Implement three-tier caching:
-
BigQuery Result Cache: Enable query caching (default 24hr). Use deterministic queries without CURRENT_TIMESTAMP() in SELECT to maximize cache hits.
-
Materialized Views: Create for common aggregations:
CREATE MATERIALIZED VIEW `project.dataset.hourly_device_metrics
AS SELECT device_id, TIMESTAMP_TRUNC(timestamp, HOUR) as hour,
AVG(temperature) as avg_temp, MAX(temperature) as max_temp
FROM `project.dataset.device_telemetry_optimized
GROUP BY device_id, hour
- Application Cache: Use Cloud Memorystore (Redis) to cache dashboard API responses for 5-10 minutes. Invalidate on data updates.
Additional Optimizations:
- Enable BI Engine (10GB capacity minimum) for sub-second dashboard queries
- Use approximate aggregation functions (APPROX_COUNT_DISTINCT) where exact precision isn’t critical
- Implement incremental data loading instead of full refreshes
- Monitor query costs via Cloud Monitoring and set up slot reservation for predictable performance
These changes reduced our dashboard load time from 70s to 1.8s and cut BigQuery costs by 87%. Data scanned dropped from 2.4TB per query to 45GB.