Complete Solution for Aurora Serverless Connection Timeouts
Your issue is a combination of connection pool misconfiguration and Aurora Serverless v1’s connection limits during scaling. Here’s how to fix it:
Aurora Scaling Configuration:
Aurora Serverless v1 has dynamic max_connections based on current ACU capacity:
- 2 ACUs: ~90 connections
- 4 ACUs: ~180 connections
- 8 ACUs: ~360 connections
During scaling from 2 to 8 ACUs, there’s a 30-60 second transition where connections may be briefly unavailable. Your 20-30 containers with 10 connections each (200-300 total) immediately exceed the 90 connection limit at 2 ACUs, causing timeouts before scaling even begins.
Connection Pool Tuning:
Reduce your connection pool dramatically:
connectionLimit: 2,
connectTimeout: 30000,
aqueueLimit: 0,
waitForConnections: true,
enableKeepAlive: true,
keepAliveInitialDelay: 10000
With 30 containers at 2 connections each, you’ll use 60 connections at 2 ACUs (66% utilization), leaving headroom. The longer connectTimeout (30s) allows time for Aurora to scale before timing out. Enable keep-alive to detect and remove stale connections.
Implement application-level connection retry logic:
const maxRetries = 3;
const baseDelay = 1000;
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
return await pool.query(sql);
} catch (err) {
if (attempt === maxRetries) throw err;
await sleep(baseDelay * Math.pow(2, attempt));
}
}
CloudWatch Monitoring:
Create a dashboard tracking these metrics:
DatabaseConnections - current active connections
ServerlessDatabaseCapacity - current ACU level
ACUUtilization - percentage of current capacity used
CommitThroughput and SelectThroughput - database activity
Set CloudWatch alarms:
- DatabaseConnections > 75% of calculated max_connections for current ACU
- ACUUtilization > 70% for 5 minutes (triggers scaling)
- High connection errors from your application logs
Calculate connection threshold: At 2 ACUs with 90 max connections, alarm at 68 connections (75%). This gives early warning before hitting limits.
Additional Recommendations:
-
Consider RDS Proxy: Deploy RDS Proxy in front of Aurora Serverless. It pools connections and handles scaling transitions transparently. Your containers connect to the proxy (which maintains connections to Aurora), eliminating timeouts during scaling:
- Proxy max_connections: Set to 100% of your container count × connection pool size
- Proxy connection pool: Let it manage Aurora connections efficiently
- Adds ~1-2ms latency but eliminates 15% failure rate
-
Increase Minimum ACUs: Set min ACUs to 4 instead of 2. This doubles your connection capacity to 180 and reduces scaling frequency. The cost increase is minimal compared to lost transactions.
-
Pre-warming Strategy: If you can predict traffic spikes, trigger scaling proactively by running a lightweight query that increases ACU utilization above 70%, forcing Aurora to scale before the actual load hits.
-
Evaluate Aurora Serverless v2: Consider migrating to v2, which scales in finer increments (0.5 ACU) and scales much faster (typically under 15 seconds). v2 also maintains connections during scaling, eliminating this entire class of issues.
Implementing connection pool reduction and CloudWatch monitoring will immediately improve your situation. Adding RDS Proxy will eliminate timeouts entirely during scaling events.