Hello!
TLDR; we are looking for beta testers for a new feature to reduce the cost of your non-production clusters (Karpenter + spot instances).
We have been working over the past quarter to improve the way the cluster node autoscaler behaves and, in particular, we plan to move out from the AWS cluster autoscaler and migrate to Karpenter (a new autoscaler developed by AWS).
I’ll try to sum up here the advantages of using Karpenter as autoscaler:
- allocation optimizations: Karpenter optimizes the workload allocation on your cluster nodes. More in detail, it looks at the number of CPU/RAM needed by all your applications and finds the best cost-effective combination of instance types to run your applications. Moreover, you can run your cluster with only 1 or 2 instances (today the minimum is 3)
- Spot instances: if the feature is enabled, Karpenter will provision EC2 spot instances to run your applications, strongly reducing your EC2 cost (more info on EC2 spot instances here). If no spot instances are available, it will fall back to the standard on-demand EC2 instances (the same instances we provision today).
Looking for beta testers!
We are looking for beta testers to validate our feature before making it available to everyone.
We have only a few pre-requisite to participate:
- the cluster should NOT run critical workloads (no production, we are still in a beta phase ). The best would be a cluster running ephemeral environments or staging/dev environments.
- it’s a one-way feature activation: once we switch your cluster to use Karpenter, there’s no turning back (if you want to go back to the previous setup, you’ll have to create a new cluster from scratch and clone the environments there)
- proactively report any problem you may find while using the new autoscaler or ideas you have in mind to improve these two features.
If you want to participate, feel free to contact us on intercom or direct message me here on the forum.
Things you should know before the migration
- This change is non-reversible. If you want to rollback you will have to 1) create a new cluster 2) clone the environments in the new cluster
- you should expect a small downtime during the migration since the cluster nodes will be re-created
- This change will create a NAT gateway, adding a small monthly fee (around 30$/month) which should be compensated by the usage of the spot instances.
- You can’t activate the feature on a production cluster (flag
production
activated) - If you activate the spot instance feature, at least one node will stay
on demand
(required for some of the qovery applications). Remember that spot instances might be killed at any time, creating potential instability on your applications - The UI won’t be aligned with your setup and you’ll still see the node configuration which won’t have any effect on the setup, except for the disk size. We are working on improving the UI and aligning it with the Karpenter setup.
- While today the number of instances is limited with the min-max node setup, Karpenter provisions the right instance type/number based on the number of applications you ask to deploy with no boundaries (no min-max nodes). Keep an eye on your AWS console budget spent regularly
Alessandro