Aurora Serverless fails to scale during month-end close, causing timeout errors

jacob_arch · August 20, 2025, 7:07am

We’re running Aurora Serverless v1 (PostgreSQL 11.9) for our ERP financial reporting database. During month-end close processes, we’re seeing timeout errors when 40+ concurrent users run financial reports simultaneously. The cluster is configured with min capacity 2 ACUs and max 64 ACUs, but monitoring shows it takes 8-12 minutes to scale from 2 to 16 ACUs during peak load.

Connection errors spike during this scaling lag:


ERROR: connection timeout after 30000ms
CONNECTION_POOL: waiting for available connection
active_connections: 87/90 max_connections

Our financial close window is critical - these delays impact month-end reporting SLAs. We’ve considered connection pooling improvements and evaluating provisioned Aurora as an alternative. What’s the best approach to handle these predictable monthly spikes without over-provisioning for the entire month?

georgewizard · August 25, 2025, 6:14am

Thanks for the pre-warming suggestion. We could schedule that via Lambda before the close window. But I’m concerned about the connection pool exhaustion we’re seeing - even if scaling was faster, we’re hitting 87/90 connections. Should we increase max_connections parameter or implement application-side connection pooling like PgBouncer?

lauradev · August 20, 2025, 7:54am

The 8-12 minute scaling lag is expected behavior for Aurora Serverless v1 during cold scaling events. When scaling from 2 to 16 ACUs, the cluster provisions new compute capacity which involves connection draining and capacity allocation. For predictable monthly spikes, consider pre-warming the cluster 30 minutes before your close process starts using a simple query loop to trigger gradual scaling. This avoids the cold start penalty during critical operations.

ryanadmin · September 22, 2025, 5:55am

We faced identical issues last year with our financial close processes. Our interim solution was switching to provisioned Aurora with scheduled scaling - we use AWS CLI scripts to scale up from db.r5.large to db.r5.2xlarge two hours before month-end close, then scale back down after. This gives us predictable performance during critical windows without paying for large instances all month. Combined with RDS Proxy for connection management, our timeout errors dropped to zero.

sandraguru · October 5, 2025, 3:24am

The provisioned Aurora with scheduled scaling approach sounds pragmatic for our needs. Can someone clarify the connection pooling piece though - would RDS Proxy work better than PgBouncer for our reporting workload, and does it integrate seamlessly with scheduled instance scaling?

Topic		Replies	Views
Aurora Serverless connection timeouts from ECS containers during scaling events Amazon Web Services (AWS) question , compute , database , scaling , aws-2020 , connection-pooling , ecs , cloudwatch , aurora-serverless	7	0	October 3, 2025
Aurora failover latency causes ERP transaction stalls during maintenance Amazon Web Services (AWS) question , database , connection-pool , aws-2021 , cloudwatch , aurora-mysql , aurora , aurora-failover , transaction-stall	6	0	July 19, 2025
Aurora PostgreSQL slow query performance during BI reporting peak hours Amazon Web Services (AWS) question , analytics , database , aws-2021 , performance-tuning , bi-tools , reporting-delays , aurora-postgresql , slow-queries	3	1	September 24, 2025
Scalability considerations for ac-2021 database module in high-transaction environments Alibaba Cloud discussion , compute , database , connection-pool , query-optimization , scalability , ac-2021 , rds , mysql	7	0	August 2, 2025
Aurora slow query performance during peak hours impacting ERP transaction processing Amazon Web Services (AWS) question , compute , performance , database , indexing , query-optimization , aws-2021 , slow-query , aurora-mysql	5	0	August 4, 2025
RDS Data API returns Internal Server Error when executing complex SQL queries with joins Amazon Web Services (AWS) question , database , sql , rest-api , query-optimization , aws-2019 , reporting-blocked , rds-data-api , internal-error	3	0	May 29, 2025
Cloud Run database connection pool exhausted under high concurrent load Google Cloud Platform (GCP) question , compute , database , sql , gcp-2021 , connection-pooling , cloud-sql , cloud-run , pool-exhausted	6	0	October 12, 2025
Function Compute times out when processing large payroll batch exports Alibaba Cloud question , serverless , payroll , compute , timeout , event-driven , batch-processing , function-compute , ac-2021	3	0	January 5, 2025
Analytics dashboard queries timing out during peak hours on OCI Compute with custom data model Oracle Cloud question , reporting , compute , analytics , performance , timeout , sql , query-optimization , oci-2019	5	0	July 4, 2025

Aurora Serverless fails to scale during month-end close, causing timeout errors

Related topics