Cloud SQL connection pool exhaustion from containerized ERP app leads to intermittent crashes

michaelace · May 22, 2025, 4:19pm

Our containerized ERP application running on GKE keeps crashing with connection pool exhaustion errors when connecting to Cloud SQL PostgreSQL. We’re using Cloud SQL Proxy as a sidecar container in our pods.

The errors appear during moderate load (50-80 concurrent users). Our Cloud SQL instance is db-custom-4-16384 with max_connections set to 500. Each pod’s connection pool is configured with maxPoolSize=20, and we typically run 8-10 pod replicas.


ERROR: Connection pool exhausted
at HikariPool.getConnection(HikariPool.java:123)
Caused by: PSQLException: FATAL: remaining connection slots
  are reserved for non-replication superuser connections

The math should work (10 pods × 20 connections = 200, well under 500 limit), but we’re hitting limits. Could this be related to how Cloud SQL Proxy manages connections or our app’s connection pool configuration?

barbara_func · June 19, 2025, 10:46pm

Check your pod termination behavior. When pods are killed or restarted during deployments, if connections aren’t gracefully closed, they can remain in Cloud SQL for up to 10 minutes (default TCP timeout). This creates a buildup of stale connections. Implement preStop hooks in your pod spec to explicitly close database connections before termination. Also review your connection pool idle timeout settings.

megha_king · June 29, 2025, 12:47pm

This is almost certainly a connection lifecycle issue combined with Cloud SQL Proxy configuration. I’ve seen this pattern before with containerized apps. The proxy itself doesn’t consume connections per se, but it can mask connection state issues from your application.

leo_api · May 24, 2025, 5:18pm

I suspect your application isn’t properly closing connections or has connection leaks. Even with proper pool configuration, if connections aren’t returned to the pool after use, you’ll exhaust available connections quickly. Add connection leak detection to your HikariCP config with leakDetectionThreshold=60000 to identify problematic code paths.

megha_king · June 6, 2025, 3:39pm

Good point about reserved connections. I checked Cloud SQL monitoring and saw actual connection count spiking to 485-490 during crashes. That’s way more than our expected 200. Could there be zombie connections not being cleaned up properly?

Topic		Replies	Views
Cloud Run database connection pool exhausted under high concurrent load Google Cloud Platform (GCP) question , compute , database , sql , gcp-2021 , connection-pooling , cloud-sql , cloud-run , pool-exhausted	6	1	October 12, 2025
Cloud SQL failover lag triggers high connection errors in application tier Google Cloud Platform (GCP) question , compute , database , java , high-availability , gcp-2020 , retry-logic , connection-pooling , cloud-sql	3	4	January 27, 2025
RDS Proxy connection leak from ECS containers causes database exhaustion Amazon Web Services (AWS) question , database , connection-pool , aws-2019 , ecs , mysql , cloudwatch , container-service , rds-proxy	4	4	May 4, 2025
Autonomous Database connection pool leak from OCI Compute instances causing resource exhaustion Oracle Cloud question , compute , performance , database , java , connection-pool , oci-2020 , autonomous-database , resource-leak	3	1	January 24, 2025
Cloud SQL failover triggers ERP downtime due to DNS propagation delays Google Cloud Platform (GCP) question , compute , database , high-availability , gcp-2021 , connection-timeout , cloud-sql , failover , dns-propagation	3	2	March 2, 2025
Cloud SQL automated backups fail during peak hours due to database locks and transaction contention Google Cloud Platform (GCP) question , database , sql , gcp-2019 , lock-contention , pitr , cloud-sql , backup-disaster , backup-scheduling	3	2	January 16, 2025
Aurora Serverless fails to scale during month-end close, causing timeout errors Amazon Web Services (AWS) question , database , timeout-errors , devops , aws-2021 , financial-reporting , connection-pooling , aurora-serverless , autoscaling	4	2	October 5, 2025
Aurora Serverless connection timeouts from ECS containers during scaling events Amazon Web Services (AWS) question , compute , database , scaling , aws-2020 , connection-pooling , ecs , cloudwatch , aurora-serverless	7	2	October 3, 2025
Cloud SQL query latency spikes but Stackdriver logs missing slow queries Google Cloud Platform (GCP) question , database , sql , observability , gcp-2021 , missing-logs , cloud-logging , cloud-sql , performance-troubleshooting	6	4	July 3, 2025

Cloud SQL connection pool exhaustion from containerized ERP app leads to intermittent crashes

Related topics