We’re running both EKS and ECS workloads and trying to optimize our scaling strategies. EKS uses Cluster Autoscaler to add/remove nodes based on pod resource requests, while ECS has Capacity Providers that manage scaling differently.
I’m curious about the practical differences in how these approaches handle scaling decisions, cost efficiency, and performance. Cluster Autoscaler seems more reactive to pending pods, while Capacity Providers appear to have more predictive scaling options. Has anyone run both and can share insights on which works better for different workload patterns? Also interested in hearing about scaling speed and cost implications of each approach.
From a cost perspective, both have pros and cons. EKS Cluster Autoscaler can be wasteful if not tuned properly - those lingering nodes add up. But it works well with Spot instances if you configure multiple node groups. ECS Capacity Providers have better cost optimization built-in, especially the Fargate options where you only pay for task runtime. The ECS approach also handles mixed Spot/On-Demand strategies more elegantly through capacity provider weights.
The Fargate instant scaling point is interesting. We have some batch processing workloads that could benefit from that. Currently using EKS with spot instances which works but has the lag time you mentioned.
One thing to consider is scaling speed. Cluster Autoscaler can take 3-5 minutes to provision new nodes in EKS, then additional time for pods to start. ECS with Fargate scales nearly instantly since there’s no node provisioning. If you’re using EC2-backed capacity providers in ECS, scaling speed is similar to EKS. For bursty workloads, Fargate’s instant scaling is hard to beat, but you pay a premium for that convenience.
Cluster Autoscaler on EKS works well but has some quirks. It scales up quickly when pods are pending, but scale-down can be slow because it needs to ensure node draining won’t disrupt workloads. We’ve had situations where nodes stay around longer than needed, increasing costs. The key is tuning the scale-down delay and unneeded time parameters. Also, CA doesn’t consider actual resource utilization - only pod requests, so if your pods over-request resources, you might scale more than necessary.