I configure deployments rules for my development environment to reduce my infrastructure’s costs. The rules works fine and all applications are shutdown and started at the right time. But my EC2 instances are not stopped by the cluster. I figured out that I have some customs helm charts installed on my cluster like fluent-bit to send logs to CloudWatch by following this tutorial Integrate your application logs to Cloudwatch | Qovery. The main problem is that there is one pod on each node. I can see some pods on kube-system namespace and the number of pods is the same as the number of running ec2 instances. Is it possible that this is the main reason why my EC2 instances are not stopped ? (I didn’t find where is configured the downscaling node time but after hours nothing change).
If it’s the main reason, how can I use the deployment rules with this kind of pods that are really helpful ? Is there an other way to install it from Qovery Console ? I think every new pods installer will produce the same side effect ? I think about Prometheus or Kubecost. Does this example works with Qovery deployment rules How to deploy Helm charts | Qovery and prevent EC2 instance to not be shutdown ?
If not, why my EC2 instances are not shutdown ?
Thanks a lot
Hi @Orkin ,
a few points to keep into account:
- the main use case for the environment auto start/stop is for the multi-node clusters: by stopping the environments, you ensure that the autoscaler will scale down the number of nodes (EC2) of the cluster to the minimum (which is usually 3 nodes but it depends on your cluster config). This means that you will never be able to completely shut down the cluster with this feature and you will always have at minimum 3 EC2 instances running.
- even if no workload would be running on your EC2, you will still be paying for it (That’s how AWS works). You have to explicitly request to shut it down
So the only option you have right now is to manually shut down/start your cluster from the console. In this case, all the EC2 instances of the cluster will be shut down. of course, this could be done programmatically but this script should run on another cluster/infra.
We are working on integrating Karpenter into our product and it should automatically shut down the EC2 instances when they are not needed anymore but you will never be able to reach cost 0: on EKS clusters you always have 73$ base cost (0.10$ hour) that can’t be avoided, the only way to not pay it is to completely delete the cluster and re-create it.
Hi @a_carrano thanks for your reply.
Yes I understand things like that. My cluster configuration set a minimum of 3 nodes and a maximum of 20. But after the environnement was stop the number of node stay the same. There is only 2 docker containers that are running (it’s only tooling). So I can’t reach the minimum of 3 EC2 instances
Yes it’s fine for me but they are running without workload and are not scaled down. And this is my point
I don’t want to shut down my cluster because there is 2 containers that are not stopped. I just want to avoid paying EC2 instance without workload for an environment stopped. I know that I will have a fixed minimum price per EKS cluster on AWS.
So if I understand properly your point, right now nodes are not scaled down properly (and EC2 instances not terminated) ?
We use the AWS cluster autoscaler to increase or decrease the number of EC2 instances. There are a few limitations (like it does not optimize the pods allocation to reduce the EC2 waster resources) but it should not leave EC2 instances running with no workload on it.
Do you still have this issue on your cluster and if yes which one?
We are working on replacing the AWS autoscaler with Karpenter, it will allow us to optimize the requested EC2 and reduce the spent but it won’t be out soon (probably end of the year)
@a_carrano Yes I still have the issue, this is the cluser :
This morning before the started environment, I had 9 EC2 instances to run 2 containers (tools that are not shut down at night). So I don’t understand why ? In the worst case it should be a minimum of 2 for thoses 2 containers but in fact 3 because of the minimum cluster node requirements.
on the cluster, there are not only your apps running but also others like:
- cert manager
- external DNS
- qovery agent
and probably the autoscaler can’t optimize that with small instances (t4g.medium)
maybe the best in your case is to increase the instance type size and reduce the max instance number
Thanks @a_carrano I will try but in term of cpu I can’t see what king of instance I can use because all biggest instance just have more ram but not significant cpu upgrade, do you have a advise ?
The C family on AWS are more compute intensive processes, have you looked at them?
Yes I can try with c6.xlarge that can be a good type for us but right now even I switch them, as Qovery don’t allow less than 3 nodes, the minimum node requirement will cost more than 10 t4g.medium instance running without workload (0,1459 per instance vs 0,0368 per instance).
for now I don’t have a solution except finding the instance type that fits your need without increasing your cost.
This is the reason why we are integrating the new autoscaler (Karpenter) to avoid these situations and let you go down even to a single node
Thanks @a_carrano , do you have an ETA for Karpenter release ?
Sorry for the late reply. Beginning of next year we should release this in open beta but only for new clusters