EC2 Auto Scaling group not replacing unhealthy instances after ELB health check failures

donna_lead · November 27, 2024, 2:23pm

Our EC2 Auto Scaling group is configured with ELB health checks, but it’s not replacing unhealthy instances as expected. We’ve had 3 instances marked unhealthy by the load balancer for the past 6 hours, and they’re still running in the ASG.

The Auto Scaling group configuration shows:


Health Check Type: ELB
Health Check Grace Period: 300 seconds
Default Cooldown: 300 seconds

The ELB shows these instances as “OutOfService” with failing health checks, but the ASG still considers them healthy. This is causing reduced capacity in our application cluster. The Auto Scaling policies seem correct, and instance registration with the load balancer completed successfully during launch.

I’ve verified the health check endpoint is responding correctly on healthy instances. Why isn’t the ASG terminating and replacing these failed instances based on ELB health status?

john_engineer · December 13, 2024, 5:27pm

You need to update the ASG with the correct health check type. Use the AWS CLI or console to change it. Be aware that changing from EC2 to ELB health checks can immediately mark instances as unhealthy if they’re already failing ELB checks, so the ASG might start terminating them right away. Plan for that capacity impact.

Also verify your health check grace period is long enough for instances to fully initialize and pass health checks after launch. 300 seconds might be too short if your application takes time to warm up.

raymond_master · December 23, 2024, 3:24am

I updated the health check type to ELB, and the ASG immediately started terminating the unhealthy instances and launching replacements. That fixed the immediate issue. Should I increase the grace period? Our app typically takes 2-3 minutes to be fully ready.

lauraadmin · November 29, 2024, 7:27am

Check if the ASG is actually configured to use ELB health checks. Run aws autoscaling describe-auto-scaling-groups --auto-scaling-group-names YOUR_ASG_NAME and verify the HealthCheckType field shows ‘ELB’ not ‘EC2’. Sometimes the console displays one thing but the actual configuration is different.

ryanadmin · December 1, 2024, 12:35pm

Good catch - I checked and HealthCheckType is indeed set to ‘EC2’ in the actual configuration, even though I thought I had changed it to ELB in the console. That explains why it’s ignoring the load balancer health status. How do I properly update this?

Topic		Views
ECS instance auto-scaling fails after upgrading to ac-2020 compute module Alibaba Cloud question , compute , ac-2020 , auto-scaling , ecs , launch-template , availability , cloudmonitor	5	May 24, 2025
Load balancer health checks failing for backend servers in VPC networking setup IBM Cloud question , networking , load-balancer , ic-2019 , security-groups , http , health-checks , backend-servers , traffic-disruption	5	December 11, 2024
API Gateway VPC Link returns 502 Bad Gateway when routing to private NLB in production Amazon Web Services (AWS) question , api-gateway , networking , security , rest-api , aws-2019 , apis , http , vpc-link	7	June 19, 2025
EKS Cluster Autoscaler vs ECS Capacity Providers for dynamic scaling Amazon Web Services (AWS) discussion , compute , performance , cost-optimization , aws-2021 , ecs , eks , autoscaling , cluster-autoscaler	4	August 30, 2025
CloudWatch dashboard metrics not updating after ECS service deployment with custom metrics Amazon Web Services (AWS) question , devops-auto , iam , metrics , observability , aws-2020 , json , ecs , monitoring-gap	3	December 6, 2024
ECS instance experiencing high network latency after upgrading to ac-2019 networking module Alibaba Cloud question , compute , networking , ac-2019 , bandwidth , security-groups , ecs , high-latency , vpc-routing	4	April 19, 2025
Cloud Logging custom metrics missing for autoscaled Compute Engine instances after instance group update Google Cloud Platform (GCP) question , monitoring , compute , observability , gcp-2021 , yaml , custom-metrics , autoscaling , cloud-logging	3	August 29, 2025
ECS disk auto snapshot fails to create backups, no snapshot generated Alibaba Cloud question , compute , disaster-recovery , ac-2021 , ecs , cloudmonitor , auto-snapshot , backup-failure , quota-limit	4	June 14, 2025
Security group egress rules block ECS task containers from accessing external APIs Amazon Web Services (AWS) question , compute , networking , security , aws-2019 , security-groups , ecs , nat-gateway , egress-rules	3	January 1, 2025

EC2 Auto Scaling group not replacing unhealthy instances after ELB health check failures

Related topics