Aurora Serverless connection timeouts from ECS containers during scaling events

brandon_guru · September 30, 2025, 7:24pm

We’re experiencing intermittent connection timeouts when our ECS Fargate tasks try to connect to Aurora Serverless v1 during scaling events. The application is a Node.js API using the mysql2 library.

Connection configuration:


connectionLimit: 10
connectTimeout: 10000
aqueueLimit: 0

During traffic spikes, Aurora scales up from 2 ACUs to 8 ACUs, but our application logs show connection timeouts during this 30-60 second scaling window. Failed transactions spike to about 15% during these periods. We’re monitoring with CloudWatch but not sure which metrics would help identify the root cause. Is this expected behavior with Aurora Serverless, or is there a connection pool tuning issue we need to address?

angelabuilder · October 4, 2025, 4:08pm

RDS Proxy would definitely help here. It maintains a connection pool and handles the scaling transitions gracefully by queuing requests during the brief scaling window. For Aurora Serverless v1, this is almost essential for production workloads. The proxy adds about 1-2ms latency, which is negligible compared to your timeout issues. Your connection pool settings also seem aggressive - 10 connections per container could overwhelm Aurora during scaling.

brandonsolver · September 30, 2025, 9:14pm

Aurora Serverless v1 does have a brief pause during scaling, but 15% failure rate seems high. Are you using the Data API or direct MySQL connections? The Data API handles scaling transitions better. Also, what’s your current min and max ACU configuration?

larry_guru · October 29, 2025, 11:05am

Complete Solution for Aurora Serverless Connection Timeouts

Your issue is a combination of connection pool misconfiguration and Aurora Serverless v1’s connection limits during scaling. Here’s how to fix it:

Aurora Scaling Configuration:

Aurora Serverless v1 has dynamic max_connections based on current ACU capacity:

2 ACUs: ~90 connections
4 ACUs: ~180 connections
8 ACUs: ~360 connections

During scaling from 2 to 8 ACUs, there’s a 30-60 second transition where connections may be briefly unavailable. Your 20-30 containers with 10 connections each (200-300 total) immediately exceed the 90 connection limit at 2 ACUs, causing timeouts before scaling even begins.

Connection Pool Tuning:

Reduce your connection pool dramatically:

connectionLimit: 2,
connectTimeout: 30000,
aqueueLimit: 0,
waitForConnections: true,
enableKeepAlive: true,
keepAliveInitialDelay: 10000

With 30 containers at 2 connections each, you’ll use 60 connections at 2 ACUs (66% utilization), leaving headroom. The longer connectTimeout (30s) allows time for Aurora to scale before timing out. Enable keep-alive to detect and remove stale connections.

Implement application-level connection retry logic:

const maxRetries = 3;
const baseDelay = 1000;

for (let attempt = 1; attempt <= maxRetries; attempt++) {
  try {
    return await pool.query(sql);
  } catch (err) {
    if (attempt === maxRetries) throw err;
    await sleep(baseDelay * Math.pow(2, attempt));
  }
}

CloudWatch Monitoring:

Create a dashboard tracking these metrics:

DatabaseConnections - current active connections
ServerlessDatabaseCapacity - current ACU level
ACUUtilization - percentage of current capacity used
CommitThroughput and SelectThroughput - database activity

Set CloudWatch alarms:

DatabaseConnections > 75% of calculated max_connections for current ACU
ACUUtilization > 70% for 5 minutes (triggers scaling)
High connection errors from your application logs

Calculate connection threshold: At 2 ACUs with 90 max connections, alarm at 68 connections (75%). This gives early warning before hitting limits.

Additional Recommendations:

Consider RDS Proxy: Deploy RDS Proxy in front of Aurora Serverless. It pools connections and handles scaling transitions transparently. Your containers connect to the proxy (which maintains connections to Aurora), eliminating timeouts during scaling:
- Proxy max_connections: Set to 100% of your container count × connection pool size
- Proxy connection pool: Let it manage Aurora connections efficiently
- Adds ~1-2ms latency but eliminates 15% failure rate
Increase Minimum ACUs: Set min ACUs to 4 instead of 2. This doubles your connection capacity to 180 and reduces scaling frequency. The cost increase is minimal compared to lost transactions.
Pre-warming Strategy: If you can predict traffic spikes, trigger scaling proactively by running a lightweight query that increases ACU utilization above 70%, forcing Aurora to scale before the actual load hits.
Evaluate Aurora Serverless v2: Consider migrating to v2, which scales in finer increments (0.5 ACU) and scales much faster (typically under 15 seconds). v2 also maintains connections during scaling, eliminating this entire class of issues.

Implementing connection pool reduction and CloudWatch monitoring will immediately improve your situation. Adding RDS Proxy will eliminate timeouts entirely during scaling events.

brandonsolver · October 10, 2025, 11:39am

We typically run 20-30 ECS tasks during peak hours, so that would be 200-300 connections total with our current pool size. Is that too many for Aurora Serverless? What would be a recommended connection limit per container?

sandraguru · October 16, 2025, 11:13pm

For CloudWatch monitoring, you should track DatabaseConnections, ServerlessDatabaseCapacity, and ACUUtilization metrics. Set up alarms when DatabaseConnections approaches max_connections for your current ACU level. This will help you see when you’re hitting connection limits during scaling events.

jessicaninja · October 15, 2025, 10:23am

Yes, 200-300 connections is excessive. Aurora Serverless v1 has a max_connections limit based on ACUs - at 2 ACUs you only get about 90 connections. When all your containers try to establish 10 connections each, you’re hitting the connection limit which causes timeouts. Reduce your pool to 2-3 connections per container and implement connection retry logic with exponential backoff.

jasonsolver · October 3, 2025, 2:15am

We’re using direct MySQL connections to the cluster endpoint. Current configuration is min 2 ACU, max 16 ACU, auto-pause disabled. We chose direct connections because we need sub-100ms response times and heard Data API adds latency. Should we be using RDS Proxy instead? I’ve seen it mentioned but don’t fully understand how it would help with scaling events.

Topic		Replies	Views
Aurora Serverless fails to scale during month-end close, causing timeout errors Amazon Web Services (AWS) question , database , timeout-errors , devops , aws-2021 , financial-reporting , connection-pooling , aurora-serverless , autoscaling	4	1	October 5, 2025
Aurora failover latency causes ERP transaction stalls during maintenance Amazon Web Services (AWS) question , database , connection-pool , aws-2021 , cloudwatch , aurora-mysql , aurora , aurora-failover , transaction-stall	6	0	July 19, 2025
RDS MySQL connection pool exhaustion after upgrading to ac-2020 database module Alibaba Cloud question , compute , database , connection-pool , ac-2020 , rds , mysql , max-connections , transaction-failures	7	0	November 2, 2025
Scalability considerations for ac-2021 database module in high-transaction environments Alibaba Cloud discussion , compute , database , connection-pool , query-optimization , scalability , ac-2021 , rds , mysql	7	0	August 2, 2025
RDS Data API returns Internal Server Error when executing complex SQL queries with joins Amazon Web Services (AWS) question , database , sql , rest-api , query-optimization , aws-2019 , reporting-blocked , rds-data-api , internal-error	3	0	May 29, 2025
Aurora slow query performance during peak hours impacting ERP transaction processing Amazon Web Services (AWS) question , compute , performance , database , indexing , query-optimization , aws-2021 , slow-query , aurora-mysql	5	0	August 4, 2025
Cloud Run database connection pool exhausted under high concurrent load Google Cloud Platform (GCP) question , compute , database , sql , gcp-2021 , connection-pooling , cloud-sql , cloud-run , pool-exhausted	6	0	October 12, 2025
Aurora PostgreSQL slow query performance during BI reporting peak hours Amazon Web Services (AWS) question , analytics , database , aws-2021 , performance-tuning , bi-tools , reporting-delays , aurora-postgresql , slow-queries	3	1	September 24, 2025
Effective CloudWatch alarm strategies for RDS performance monitoring Amazon Web Services (AWS) discussion , database , observability , aws-2020 , performance-monitoring , alerting , rds , cloudwatch , incident-response	5	1	November 11, 2025

Aurora Serverless connection timeouts from ECS containers during scaling events

Related topics