* Load balancing is handled pretty well by ALBs, and there are integrations with ECS autoscaling for health checks and similar
* Log aggregation happens out of the box with CloudWatch Logs and CloudWatch Log Insights. It's configurable if you want different behavior
* On ECS, you configure a "service" which describes how many instances of a "task" you want to keep running at a given time. It's the abstraction that handles spinning up new tasks when one fails
* ECS supports ready checks, and (as noted above) integrates with ALB so that requests don't get sent to containers until they pass a readiness check
* Machine maintenance schedules are non-existent if you use ECS / Fargate, or at least they're abstracted from you. As long as your application is built such that it can spin up a new task to replace your old one, it's something that will happen automatically when AWS decommissions the hardware it's running on. If you're using ECS without Fargate, it's as simple as changing the autoscaling group to use a newer AMI. By default, this won't replace all of the old instances, but will use the new AMI when spinning up new instances
But again, though: the biggest selling point is the lack of maintenance / babysitting. If you set up your stack using ECS / Fargate and an ALB five years ago, it's still working, and you've probably done almost nothing to keep it that way.
You might be able to do the same with Kubernetes, but your control plane will be out of date, your OSes will have many missed security updates. Might even need a major version update to the next LTS. Prometheus, Loki, Tempo, Promtail will be behind. Your helm charts will be revisions behind. Newer ones might depend on newer apiVersions that your control plane won't support until you update it. And don't forget to update your CNI plugin across your cluster, too.
It's at least one full time job just keeping all that stuff working and up-to-date. And it takes a lot more know-how than just ECS and ALB.
It seems like you are comparing ECS to a self-managed Kubernetes cluster. Wouldn't it make more sense to compare to EKS or another managed Kubernetes offering? Many of your points don't apply in that case, especially around updates.
A managed Kubernetes offering removes only some of the pain, and adds more in other areas. You're still on the hook for updating whatever add-ons you're using, though yes, it'll depend on how many you're using, and how painful it will be varies depending on how well your cloud provider handles it.
Most of my managed Kubernetes experience is through Amazon's EKS, and the pain I remember included frustration from the supported Kubernetes versions being behind the upstream versions, lack of visibility for troubleshooting control nodes, and having to explain / understand delays in NIC and EBS appropriation / attachments for pods. Also the ALB ingress controller was something I needed to install and maintain independently (though that may be different now).
Though that was also without us going neck-deep into being vendor agnostic. Using EKS just for the Kubernetes abstractions without trying hard to be vendor agnostic is valid--it's just not what I was comparing above because it was usually that specific business requirement that steered us toward Kubernetes in the first place.
If you ARE using EKS with the intention of keeping as much as possible vendor agnostic, that's also valid, but then now you're including a lot of the stuff I complained about in my other comment: your own metrics stack, your own logging stack, your own alarm stack, your own CNI configuration, etc.
(Apologies for the snark, someone else made a short snarky comment that I felt was also wrong and I thought this thread was in reply to them before I typed it out -- thank you for the reply)
- ALBs -- yeah this is correct. However ALBs have much longer startup/health check times than Envoy/Traefik
- Cloudwatch - this is true, however the "configurable" behavior makes cloudwatch trash out of the box. you get i.e. exceptions split across multiple log entries with the default configure
- ECS tasks - yep, but the failure behavior of tasks is horrible because there're no notifications out of the box (you can configure it)
- Fargate does allow you to avoid maintenance, however it has some very hairy edges like i.e. you can't use any container that expects to know its own ip address on a private vpc without writing a custom script. Networking in general is pretty arcane on Fargate and you're going to have to manually write and maintain the breakages from all this
> You might be able to do the same with Kubernetes, but your control plane will be out of date, your OSes will have many missed security updates. Might even need a major version update to the next LTS. Prometheus, Loki, Tempo, Promtail will be behind. Your helm charts will be revisions behind. Newer ones might depend on newer apiVersions that your control plane won't support until you update it. And don't forget to update your CNI plugin across your cluster, too.
I think maybe you haven't used K8S in years. Karpenter, EKS, + a GitOps (Flux or Argo) makes you get the same machine maintenance feeling as ECS but on K8S without any of the annoyances of dealing with ECS. All your app versions can be pinned or set to follow latest as you prefer. You get rolling updates each time you switch machines (same as ECS, and if you really want to you can run on top of Fargate).
By contrast, if your ECS/Fargate instance fails you haven't mentioned any notifications in your list -- so if you forgot to configure and test that correctly, your ECS could legitimately be stuck on a version of your app code that is 3 years old and you might not know if you haven't inspected the correct part of amazon's arcane interface.
By the way, you're paying per use for all of this.
At the end of the day, I think modern Kubernetes is strictly simpler, cheaper, and better than ECS/Fargate out of the box and has the benefit of not needing to rely on 20 other AWS specific services that each have their own unique ways of failing and running a bill up if you forget to do "that one simple thing everyone who uses this niche service should know".
ECS+Fargate does give you zero maintenance, both in theory and in practise. As someone, who runs k8s at home and manages two clusters at work, I still do recommend our teams to use ECS+Fargate+ALB if they satisfy their requirements for stateless apps and they all love it because it is literaly zero maintenance, unlike you just described what k8s requires.
Sure there are a lot of great feature with k8s which ECS cannot do, but when ECS does satisfy the requirements, it will require less maintenance, no matter what kind of k8s you compare it against to.
* Log aggregation happens out of the box with CloudWatch Logs and CloudWatch Log Insights. It's configurable if you want different behavior
* On ECS, you configure a "service" which describes how many instances of a "task" you want to keep running at a given time. It's the abstraction that handles spinning up new tasks when one fails
* ECS supports ready checks, and (as noted above) integrates with ALB so that requests don't get sent to containers until they pass a readiness check
* Machine maintenance schedules are non-existent if you use ECS / Fargate, or at least they're abstracted from you. As long as your application is built such that it can spin up a new task to replace your old one, it's something that will happen automatically when AWS decommissions the hardware it's running on. If you're using ECS without Fargate, it's as simple as changing the autoscaling group to use a newer AMI. By default, this won't replace all of the old instances, but will use the new AMI when spinning up new instances
But again, though: the biggest selling point is the lack of maintenance / babysitting. If you set up your stack using ECS / Fargate and an ALB five years ago, it's still working, and you've probably done almost nothing to keep it that way.
You might be able to do the same with Kubernetes, but your control plane will be out of date, your OSes will have many missed security updates. Might even need a major version update to the next LTS. Prometheus, Loki, Tempo, Promtail will be behind. Your helm charts will be revisions behind. Newer ones might depend on newer apiVersions that your control plane won't support until you update it. And don't forget to update your CNI plugin across your cluster, too.
It's at least one full time job just keeping all that stuff working and up-to-date. And it takes a lot more know-how than just ECS and ALB.