On my project, for the prod environment, I have several apps that I need to deploy.
The problem I have is that the deployment of those apps is distributed over multiple different instances, which means that I have much more running EC2 instances than I need.
At the time I’m writing this, we have 14 EC2 instances running, with all of them using less than 15% of CPU. But because the apps are deployed over multiple instances, there is no scale down that can happen. Which means that we are running at least 3 or 4 times more EC2 instances than needed at the moment. And that costs us a lot of money.
So, is there some way/configuration to make sure that all our apps are deployed on the same instances, so that we can have just 4 or 5 running instead of 14?
I haven’t found any setting for that so far.
This kind of configuration is not available as we don’t recommend this setup as nodes can crash and then your applications will not be available if all the pods run on this node.
Could you please provide the URL of your cluster so that we can check whether the optimization of the auto-scaler configuration recently implemented has been deployed successfully? (see the change log)
I’m not sure I understand the problem of this configuration. There are always multiple nodes running (at least 3 in the configuration I’m thinking of), so if one crashes with all the pods, the other ones will continue working until a new node is created, no?
The identifier of my cluster is: aa618a4b-f934-425a-99df-aac82c8cac32 (organization is 3842bf65-225d-43f5-8cb2-6807a0f1262f)…
When checking your nodes resource consumption, I can see that you have requested much more CPU than you actually use. None of your nodes has less that 60% of requested CPU so the cluster autoscaler can’t drop any node.
But indeed the utilization is very low. I encourage you to review the resources assigned to the services in the resource settings to ensure they are aligned with actual usage.
I have some questions though.
Just to be sure, can you confirm that: Allocatable is the maximum number of CPU that can be used on the node Requested is the sum of all the CPU requested by all the apps that are installed on the node Utilization is the actual usage (at the time of your screenshot)
And are you saying that the scaling (down) is based on the CPU Requested, not the CPU Used?
And also, how are the cron jobs handled? Let’s say a cron job runs every day at 22:00 and needs 0.5CPU. Will the 0.5CPU be included in the “Requested” all day, or only when the cron is about to start?
Sorry to meddle in the conversation, but @Pierre_Gerbelot but that resource utilization/requested graph is very useful. What command did you use to get that info for our nodes?
The data have been dumped using the following tool , you can find all description of the column in the README.
When Kubernetes starts a pod on a node, it allocates the requested resources for that pod. Regarding the cron job, pods are started only during the execution time, so the resources are only used during this period.
Regards
Would you know if there is a tool that tracks CPU usage of a container over time?
My goal is to try to see and understand what’s the real CPU consumption of a container so that I can set its resources to a better value (rather than just doing it by trial and error).
There are 4 nodes with a Requested CPU below 60% (and a very low usage), but no down scaling is happening. Would you know why?
Also, I have another question. When I allocate some resources to an app, let’s say 500mCPU. Does that mean that the app will use at the maximum 500mCPU, or can it use more if there is some available CPU in the node? If it’s the latter, that would allow me to put lower resources on each app.
The thread is useful indeed. But it doesn’t answer this question:
“When I allocate some resources to an app, let’s say 500mCPU. Does that mean that the app will use at the maximum 500mCPU, or can it use more if there is some available CPU in the node?”
On the same question of auto-scaling and usage, is it planned for Qovery to start allowing to set a different request and limit on resources usage? (talking about the setting spec.containers[].resources.limits.cpu and spec.containers[].resources.requests.cpu for example).
That would allow us more flexibility and handle our load in a better way.
It’s not planned since it’s not a good practice. Why? Because you can easily overcommit and impact other applications running on the same nodes than yours.
It can be too problematic (you can find several articles on the topic), however for applications/containers deployed with Qovery. However, you can do what you want with Helm deployments as we assume that if you can deploy with charts, you better know what you’re doing.
We plan in the future to restrict Helm access to power user only and let Qovery admins decide who can deploy with Helm.