Our IBM Cloud Load Balancer is marking all backend servers as unhealthy even though the application is running fine. When I curl the health check endpoint directly from within the VPC, it responds correctly with HTTP 200. The health check configuration in the load balancer shows:
Protocol: HTTP
Port: 8080
Path: /health
Interval: 10s
Timeout: 5s
All three backend servers are showing as “failing” in the load balancer dashboard, causing traffic disruption. We’ve verified the application logs and the /health endpoint is being hit and returning 200 OK responses. I’m wondering if this is related to our security group configuration or if there’s something wrong with how the load balancer is configured to reach the backend servers.
The most common cause of this issue is security group rules not allowing traffic from the load balancer to reach your backend servers. IBM Cloud Load Balancers use specific source IP ranges for health checks, and these need to be explicitly allowed in your backend server security groups. Check if your security group has an inbound rule allowing HTTP traffic on port 8080 from the load balancer subnet CIDR or from the IBM Cloud load balancer service IP ranges.
You need to allow traffic from the load balancer’s subnet CIDR to your backend servers. Since your load balancer is in 10.10.1.0/24, add an inbound rule to your backend server security group allowing TCP port 8080 from that CIDR. Also, make sure your backend servers can respond - check that there’s no overly restrictive outbound rule blocking responses back to the load balancer subnet. Health checks are bidirectional communication, so both inbound and outbound need to work.
If your application takes 3-4 seconds to respond and your health check timeout is set to 5 seconds, you’re cutting it very close. However, that wouldn’t cause all checks to fail consistently. Let me ask - did you verify the security group rule was actually applied and is in effect? Sometimes there’s a delay or the rule gets added to the wrong security group. Also, check if there’s a network ACL on your backend subnet that might be blocking traffic. Network ACLs are stateless and need both inbound and outbound rules configured correctly.
I checked our security groups and found that we only have rules allowing traffic from our office IP range and from within the VPC CIDR (10.10.0.0/16). The load balancer is in a different subnet (10.10.1.0/24) and the backend servers are in 10.10.2.0/24. Do I need to add a specific rule for the load balancer subnet, or is there a service IP range I should be using instead?
Another thing to verify is whether your application is actually listening on all network interfaces. Sometimes applications bind to localhost (127.0.0.1) instead of 0.0.0.0, which would explain why curl from within the same server works but external health checks fail. SSH into one of your backend servers and run netstat -tlnp | grep 8080 to see what interface your application is bound to. If it shows 127.0.0.1:8080, you need to configure your application to listen on 0.0.0.0:8080 instead.