BigQuery API query jobs stuck in pending state for long-running analytics workloads

Our BigQuery API query jobs are getting stuck in pending state for 15-20 minutes before executing. These are complex analytical queries scanning multiple tables, and the delays are making our analytics pipeline unusable.

I suspect we’re hitting job quotas, but the quota monitoring doesn’t show any limits reached. Query optimization might help, but we’re already using partitioned tables where possible. The jobs eventually run, but the pending time is unacceptable.


POST /bigquery/v2/projects/{project}/jobs
query: SELECT * FROM analytics.user_events
       WHERE event_date >= '2024-01-01'
jobReference.jobId: job_xyz123
Status: PENDING (stuck for 18 minutes)

Anyone experienced similar pending state delays with BigQuery API jobs?

Your pending state delays stem from three interrelated issues that need coordinated fixes. Let me walk through each systematically.

Job quotas and slot contention: You’re hitting soft limits, not hard quota errors. With 20-30 concurrent queries, you’re competing for available slots in the shared on-demand pool. Each complex analytical query can consume 2000+ slots, and BigQuery’s scheduler queues jobs when slot demand exceeds availability. Check actual slot usage:


SELECT

  job_id,

  total_slot_ms,

  total_bytes_processed

FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT

WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)

ORDER BY total_slot_ms DESC

If your top queries show 5M+ total_slot_ms, they’re monopolizing slots. Solution: either purchase slot reservations (guaranteed capacity) or implement application-level throttling to limit concurrent jobs to 10-12 for complex queries.

Query optimization: Your query has multiple inefficiencies. First, SELECT * is scanning all columns unnecessarily. Second, the WHERE clause isn’t leveraging partitioning properly. Optimize:


SELECT event_id, user_id, event_type, event_timestamp
FROM analytics.user_events
WHERE DATE(_PARTITIONTIME) >= '2024-01-01'
  AND event_date >= '2024-01-01'
LIMIT 1000000

Using _PARTITIONTIME enables partition pruning (scans only relevant partitions). Adding event_date filter helps with additional filtering. Selecting specific columns reduces data transfer. Adding LIMIT prevents runaway queries.

Partitioned tables configuration: Verify your table is actually partitioned correctly. Many tables claim to be partitioned but aren’t configured properly:


SELECT
  table_name,
  partition_expiration_days,
  require_partition_filter
FROM analytics.INFORMATION_SCHEMA.PARTITIONS
WHERE table_name = 'user_events'

If require_partition_filter is false, enable it - this forces all queries to include partition filters, preventing accidental full scans. If your table isn’t showing partition info, it’s not actually partitioned. Recreate it with proper partitioning:


CREATE TABLE analytics.user_events_partitioned
PARTITION BY DATE(_PARTITIONTIME)
CLUSTER BY user_id, event_type
AS SELECT * FROM analytics.user_events

Clustering by frequently filtered columns (user_id, event_type) further reduces data scanned. After implementing these changes, your pending times should drop to under 30 seconds. For immediate relief, reduce concurrent job submissions to 10-12 max while you optimize queries and consider purchasing 500-1000 slot reservations if your workload consistently needs high concurrency.

Yes, 20-30 concurrent jobs will definitely cause queueing. The default concurrent query limit is 100 for on-demand pricing, but the actual scheduling depends on slot availability and query complexity. Your long-running analytical queries consume more slots, so fewer can run concurrently. Consider using reservations for predictable performance, or implement query throttling in your application to limit concurrent submissions.

Your query is doing a full table scan without partition filtering. Even though you have partitioned tables, your WHERE clause on event_date doesn’t use the partition column correctly. You need to filter using the _PARTITIONTIME pseudo-column for partition pruning to work. Without that, BigQuery treats it as a full scan which takes longer to schedule.