Cluster insufficient resources - VcpuLimitExceeded

Tactless7 · May 20, 2022, 3:05pm

I got theses errors when I try to deploy. My app start, i get some logs and then, this:

2022-05-20T14:30:00Z Warning FailedScheduling: 0/4 nodes are available: 2 Insufficient memory, 4 Insufficient cpu.
2022-05-20T14:31:06Z Warning FailedScheduling: skip schedule deleting pod: z59b9994d-z6311ae3b/app-z403746c7-6bfdd877d6-54szs
2022-05-20T14:48:00Z Warning FailedScheduling: 0/4 nodes are available: 2 Insufficient memory, 4 Insufficient cpu.
2022-05-20T14:48:29Z Warning FailedScheduling: skip schedule deleting pod: z59b9994d-z6311ae3b/app-z403746c7-94f68c48f-grrpl
2022-05-20T14:39:22Z Warning Unhealthy: Readiness probe failed: dial tcp 10.0.39.106:8080: i/o timeout
2022-05-20T14:39:21Z Warning Unhealthy: Liveness probe failed: dial tcp 10.0.39.106:8080: i/o timeout

Programming language and version: Node 16 with NestJs

I think there is not enough cpu, right ? But when I try to put 2 vCPU I have got this message (I have a paid plan) :

An error has occured

Invalid application config: Cannot ask more than 1560m cpu for an application, due to your nodes CPU available resources

Here is what I try to do :

bchastanier · May 20, 2022, 3:23pm

Hello @Tactless7 !

First of, thanks for using Qovery

I think there is a confusion here. Indeed, your cluster seems to lack of resources, but it’s due to nodes.

What you try to do if to increase application resources whereas you want to add more nodes.
From your settings I see you have scaler from 3-4 nodes, looking at the message it says that all your nodes are full.

I don’t know if your app requires that much CPU, but I advise you to scale it down to default CPU (if it doesn’t requires it of course).

Can you try to add one more node to you cluster (Clusters => Resources => Nodes) ?
If not possible on your end, you can also delete some env / apps on your cluster to free up some resources.

Please let me know if I can help further.

Cheers

Tactless7 · May 21, 2022, 1:39pm

Hello @bchastanier ,

Thanks for the answer, it made me look a bit further into configuration.
I choose an amount of 5-8 nodes and remove a environment recently created in my cluster. Even with theses changes, it didn’d change anything. I cannot deploy the new version of my api, and I have the same errors :

Condition not met to start the container: Ready -> Unknown(Some("ContainersNotReady")): containers with unready status: [app-z403746c7]
Condition not met to start the container: ContainersReady -> Unknown(Some("ContainersNotReady")): containers with unready status: [app-z403746c7]
terminated state exit code: 1
2022-05-21T12:51:45Z Warning Unhealthy: Liveness probe failed: dial tcp 10.0.50.143:3000: connect: connection refused
2022-05-21T12:49:39Z Warning Unhealthy: Readiness probe failed: dial tcp 10.0.50.143:3000: connect: connection refused
2022-05-21T13:29:56Z Warning Unhealthy: Readiness probe failed: dial tcp 10.0.56.251:3000: connect: connection refused
2022-05-21T13:27:36Z Warning Unhealthy: Liveness probe failed: dial tcp 10.0.56.251:3000: connect: connection refused
2022-05-21T12:36:01Z Warning FailedScheduling: 0/4 nodes are available: 2 Insufficient memory, 4 Insufficient cpu.
2022-05-21T12:37:13Z Warning FailedScheduling: skip schedule deleting pod: z59b9994d-z6311ae3b/app-z403746c7-94f68c48f-44jpg

Also, when I look into the cluster on AWS (with the root user) I can’t see any nodes. But the previous versions of my front and api apps are running and rolled back when deployment fails

bchastanier · May 21, 2022, 1:53pm

Hey @Tactless7,

Checking, will let you know ASAP.

bchastanier · May 21, 2022, 2:08pm

So, indeed, your cluster was set to have from 5 to 8 nodes, but it doesn’t seems to have been updated with those changes. Did you update your cluster after this change? If not, it’s normal you still had the error since changes hasn’t been applied to your cluster.

I changed your cluster setup to have from 3 to 8 nodes (you set it to have 5 minimum but 3 is ok, autoscaler will do its magic).
Redeployed your cluster and you apps, everything looks good now.

So bottom line, I guess there were two issues:

your cluster was full
increasing required resources needed by apps (instead of cluster) led to you not being able to deploy this app since there was not enough resources left on the cluster.

Your cluster seems to be ok with 3 nodes for now, but further multiple deployments might required it to scale up to more. Anyways, you should be good now.

Please let us know if you need further help

Have a good weekend !

Tactless7 · May 21, 2022, 2:21pm

Wow, thanks a lot !

Actually I thought the update action was to update the version of Kube. My bad
Thanks for the explanation

Have a good weekend too

bchastanier · May 21, 2022, 2:33pm

Actually I thought the update action was to update the version of Kube. My bad

No worries, I will share this point with product team (cc @Alessandro_Carrano @Florian_Lepont), I do agree it’s a bit confusing

Cheers !

polive106 · June 23, 2022, 2:54pm

Hey guys !

All my deployments fail to complete due to a lack of resources. I followed the guidelines discussed here, but still, deployments fail with the same error message as in @Tactless7 message.
Any idea on what could cause this ?

bchastanier · June 23, 2022, 2:59pm

Hello @polive106 !

Hopefully we are working on infra logs and those are coming making things clearer

In your case, there is a vCPU capacity quota preventing you from moving further:

AsgInstanceLaunchFailures: Could not launch On-Demand Instances. VcpuLimitExceeded - You have requested more vCPU capacity than your current vCPU limit of 32 allows for the instance bucket that the specified instance type belongs to. Please visit http://aws.amazon.com/contact-us/ec2-request to request an adjustment to this limit.

You should open a ticket to AWS asking them to increase this quotas.

Please let us know if it unlocked you.

Cheers

polive106 · June 24, 2022, 2:52pm

Hi,

Thanks @bchastanier for your quick answer !

I opened a ticket on AWS side. Any suggestion of a quick fix before AWS limits are updated ?

Cheers

rophilogene · June 24, 2022, 3:27pm

Hi Pierre,

Unfortunately, it’s hard to work around those limits. They should respond quite quickly (hopefully)

polive106 · June 27, 2022, 1:59pm

Hi @rophilogene,

I finally my AWS vCPUs quotas increased. But still, I cannot deploy an app in my production cluster because of this error from my deployment logs : 0/6 nodes are available: 6 Insufficient cpu.

What am i missing here ? Any configuration error on my end ?

Thanks again for your help !

rophilogene · June 27, 2022, 3:11pm

Did you give more resources to your cluster?

polive106 · June 28, 2022, 1:15pm

Yes I did, but I have an update error when I try to update the cluster (see screenshot below)

rophilogene · June 28, 2022, 1:20pm

I’ve seen this error

AsgInstanceLaunchFailures: Could not launch On-Demand Instances. VcpuLimitExceeded - You have requested more vCPU capacity than your current vCPU limit of 136 allows for the instance bucket that the specified instance type belongs to.

Can you ask AWS to increase your quota? (follow this link)

polive106 · June 28, 2022, 1:23pm

This is the max I can get from AWS, I requested 200 but they just increased to 136

Topic		Replies	Views
Application unable to be deployed - insufficient CPU Questions and Answers	11	99	June 20, 2024
0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.. (x44) Questions and Answers aws-ec2	9	2574	February 4, 2024
Not enough nodes available when I attempt a deployment Deployment qovery , aws-ec2	3	215	March 26, 2024
Unable to deploy because no resource available Deployment	3	1730	November 15, 2023
Autoscaling not working when deploying application Deployment qovery , kubernetes	2	52	September 22, 2024

Cluster insufficient resources - VcpuLimitExceeded

Related topics