Our EC2 Auto Scaling group is configured with ELB health checks, but it’s not replacing unhealthy instances as expected. We’ve had 3 instances marked unhealthy by the load balancer for the past 6 hours, and they’re still running in the ASG.
The Auto Scaling group configuration shows:
Health Check Type: ELB
Health Check Grace Period: 300 seconds
Default Cooldown: 300 seconds
The ELB shows these instances as “OutOfService” with failing health checks, but the ASG still considers them healthy. This is causing reduced capacity in our application cluster. The Auto Scaling policies seem correct, and instance registration with the load balancer completed successfully during launch.
I’ve verified the health check endpoint is responding correctly on healthy instances. Why isn’t the ASG terminating and replacing these failed instances based on ELB health status?
You need to update the ASG with the correct health check type. Use the AWS CLI or console to change it. Be aware that changing from EC2 to ELB health checks can immediately mark instances as unhealthy if they’re already failing ELB checks, so the ASG might start terminating them right away. Plan for that capacity impact.
Also verify your health check grace period is long enough for instances to fully initialize and pass health checks after launch. 300 seconds might be too short if your application takes time to warm up.
I updated the health check type to ELB, and the ASG immediately started terminating the unhealthy instances and launching replacements. That fixed the immediate issue. Should I increase the grace period? Our app typically takes 2-3 minutes to be fully ready.
Check if the ASG is actually configured to use ELB health checks. Run aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names YOUR_ASG_NAME and verify the HealthCheckType field shows ‘ELB’ not ‘EC2’. Sometimes the console displays one thing but the actual configuration is different.
Good catch - I checked and HealthCheckType is indeed set to ‘EC2’ in the actual configuration, even though I thought I had changed it to ELB in the console. That explains why it’s ignoring the load balancer health status. How do I properly update this?