Aurora Serverless fails to scale during month-end close, causing timeout errors

We’re running Aurora Serverless v1 (PostgreSQL 11.9) for our ERP financial reporting database. During month-end close processes, we’re seeing timeout errors when 40+ concurrent users run financial reports simultaneously. The cluster is configured with min capacity 2 ACUs and max 64 ACUs, but monitoring shows it takes 8-12 minutes to scale from 2 to 16 ACUs during peak load.

Connection errors spike during this scaling lag:


ERROR: connection timeout after 30000ms
CONNECTION_POOL: waiting for available connection
active_connections: 87/90 max_connections

Our financial close window is critical - these delays impact month-end reporting SLAs. We’ve considered connection pooling improvements and evaluating provisioned Aurora as an alternative. What’s the best approach to handle these predictable monthly spikes without over-provisioning for the entire month?

Thanks for the pre-warming suggestion. We could schedule that via Lambda before the close window. But I’m concerned about the connection pool exhaustion we’re seeing - even if scaling was faster, we’re hitting 87/90 connections. Should we increase max_connections parameter or implement application-side connection pooling like PgBouncer?

The 8-12 minute scaling lag is expected behavior for Aurora Serverless v1 during cold scaling events. When scaling from 2 to 16 ACUs, the cluster provisions new compute capacity which involves connection draining and capacity allocation. For predictable monthly spikes, consider pre-warming the cluster 30 minutes before your close process starts using a simple query loop to trigger gradual scaling. This avoids the cold start penalty during critical operations.

We faced identical issues last year with our financial close processes. Our interim solution was switching to provisioned Aurora with scheduled scaling - we use AWS CLI scripts to scale up from db.r5.large to db.r5.2xlarge two hours before month-end close, then scale back down after. This gives us predictable performance during critical windows without paying for large instances all month. Combined with RDS Proxy for connection management, our timeout errors dropped to zero.

The provisioned Aurora with scheduled scaling approach sounds pragmatic for our needs. Can someone clarify the connection pooling piece though - would RDS Proxy work better than PgBouncer for our reporting workload, and does it integrate seamlessly with scheduled instance scaling?