[UPCOMING FEATURE] Reduce your cloud cost with spot instances + Karpenter autoscaler - Looking for beta testers!

Hello!

TLDR; we are looking for beta testers for a new feature to reduce the cost of your non-production clusters (Karpenter + spot instances).

We have been working over the past quarter to improve the way the cluster node autoscaler behaves and, in particular, we plan to move out from the AWS cluster autoscaler and migrate to Karpenter (a new autoscaler developed by AWS).

I’ll try to sum up here the advantages of using Karpenter as autoscaler:

  • allocation optimizations: Karpenter optimizes the workload allocation on your cluster nodes. More in detail, it looks at the number of CPU/RAM needed by all your applications and finds the best cost-effective combination of instance types to run your applications. Moreover, you can run your cluster with only 1 or 2 instances (today the minimum is 3)
  • Spot instances: if the feature is enabled, Karpenter will provision EC2 spot instances to run your applications, strongly reducing your EC2 cost (more info on EC2 spot instances here). If no spot instances are available, it will fall back to the standard on-demand EC2 instances (the same instances we provision today).

Looking for beta testers!

We are looking for beta testers to validate our feature before making it available to everyone.

We have only a few pre-requisite to participate:

  • the cluster should NOT run critical workloads (no production, we are still in a beta phase :slight_smile: ). The best would be a cluster running ephemeral environments or staging/dev environments.
  • it’s a one-way feature activation: once we switch your cluster to use Karpenter, there’s no turning back (if you want to go back to the previous setup, you’ll have to create a new cluster from scratch and clone the environments there)
  • proactively report any problem you may find while using the new autoscaler or ideas you have in mind to improve these two features.

If you want to participate, feel free to contact us on intercom or direct message me here on the forum.

Things you should know before the migration

  • This change is non-reversible. If you want to rollback you will have to 1) create a new cluster 2) clone the environments in the new cluster
  • you should expect a small downtime during the migration since the cluster nodes will be re-created
  • This change will create a NAT gateway, adding a small monthly fee (around 30$/month) which should be compensated by the usage of the spot instances.
  • You can’t activate the feature on a production cluster (flag production activated)
  • If you activate the spot instance feature, at least one node will stay on demand (required for some of the qovery applications). Remember that spot instances might be killed at any time, creating potential instability on your applications
  • The UI won’t be aligned with your setup and you’ll still see the node configuration which won’t have any effect on the setup, except for the disk size. We are working on improving the UI and aligning it with the Karpenter setup.
  • While today the number of instances is limited with the min-max node setup, Karpenter provisions the right instance type/number based on the number of applications you ask to deploy with no boundaries (no min-max nodes). Keep an eye on your AWS console budget spent regularly

Alessandro

2 Likes

Just for record:

I stress-tested Karpenter and EKS with Qovery (I’m co-founder). I’m impressed by Karpenter and to me, it’s a game-changer tool for everyone using EKS / Kubernetes. It must definitely replace the default node autoscaler that is really basic and makes people using EKS waste money!

1 Like

Hello, I saw that we can enable Karpenter from our cluster advance settings.

Is it possible to enable it on existing cluster ? If yes, there is a downtime during the switch between the previous auto-scaler and the new one ? When it’s enabled there is a new interface to choose instance type from auto-scaler or it’s the same as the existing one ?

Thanks

Hi @Orkin ,

I’ve added in the initial post a few points that I have already shared with the beta testers which should answer to all of your questions

Alessandro

1 Like