Cluster insufficient resources - VcpuLimitExceeded

I got theses errors when I try to deploy. My app start, i get some logs and then, this:

2022-05-20T14:30:00Z Warning FailedScheduling: 0/4 nodes are available: 2 Insufficient memory, 4 Insufficient cpu.
2022-05-20T14:31:06Z Warning FailedScheduling: skip schedule deleting pod: z59b9994d-z6311ae3b/app-z403746c7-6bfdd877d6-54szs
2022-05-20T14:48:00Z Warning FailedScheduling: 0/4 nodes are available: 2 Insufficient memory, 4 Insufficient cpu.
2022-05-20T14:48:29Z Warning FailedScheduling: skip schedule deleting pod: z59b9994d-z6311ae3b/app-z403746c7-94f68c48f-grrpl
2022-05-20T14:39:22Z Warning Unhealthy: Readiness probe failed: dial tcp 10.0.39.106:8080: i/o timeout
2022-05-20T14:39:21Z Warning Unhealthy: Liveness probe failed: dial tcp 10.0.39.106:8080: i/o timeout
  • Programming language and version: Node 16 with NestJs

I think there is not enough cpu, right ? But when I try to put 2 vCPU I have got this message (I have a paid plan) :

An error has occured

Invalid application config: Cannot ask more than 1560m cpu for an application, due to your nodes CPU available resources

Here is what I try to do :

Hello @Tactless7 !

First of, thanks for using Qovery :slight_smile:

I think there is a confusion here. Indeed, your cluster seems to lack of resources, but it’s due to nodes.

What you try to do if to increase application resources whereas you want to add more nodes.
From your settings I see you have scaler from 3-4 nodes, looking at the message it says that all your nodes are full.

I don’t know if your app requires that much CPU, but I advise you to scale it down to default CPU (if it doesn’t requires it of course).

Can you try to add one more node to you cluster (Clusters => Resources => Nodes) ?
If not possible on your end, you can also delete some env / apps on your cluster to free up some resources.

Please let me know if I can help further.

Cheers

1 Like

Hello @bchastanier ,

Thanks for the answer, it made me look a bit further into configuration.
I choose an amount of 5-8 nodes and remove a environment recently created in my cluster. Even with theses changes, it didn’d change anything. I cannot deploy the new version of my api, and I have the same errors :

Condition not met to start the container: Ready -> Unknown(Some("ContainersNotReady")): containers with unready status: [app-z403746c7]
Condition not met to start the container: ContainersReady -> Unknown(Some("ContainersNotReady")): containers with unready status: [app-z403746c7]
terminated state exit code: 1
2022-05-21T12:51:45Z Warning Unhealthy: Liveness probe failed: dial tcp 10.0.50.143:3000: connect: connection refused
2022-05-21T12:49:39Z Warning Unhealthy: Readiness probe failed: dial tcp 10.0.50.143:3000: connect: connection refused
2022-05-21T13:29:56Z Warning Unhealthy: Readiness probe failed: dial tcp 10.0.56.251:3000: connect: connection refused
2022-05-21T13:27:36Z Warning Unhealthy: Liveness probe failed: dial tcp 10.0.56.251:3000: connect: connection refused
2022-05-21T12:36:01Z Warning FailedScheduling: 0/4 nodes are available: 2 Insufficient memory, 4 Insufficient cpu.
2022-05-21T12:37:13Z Warning FailedScheduling: skip schedule deleting pod: z59b9994d-z6311ae3b/app-z403746c7-94f68c48f-44jpg

Also, when I look into the cluster on AWS (with the root user) I can’t see any nodes. But the previous versions of my front and api apps are running and rolled back when deployment fails

Hey @Tactless7,

Checking, will let you know ASAP.

So, indeed, your cluster was set to have from 5 to 8 nodes, but it doesn’t seems to have been updated with those changes. Did you update your cluster after this change? If not, it’s normal you still had the error since changes hasn’t been applied to your cluster.

I changed your cluster setup to have from 3 to 8 nodes (you set it to have 5 minimum but 3 is ok, autoscaler will do its magic).
Redeployed your cluster and you apps, everything looks good now.

So bottom line, I guess there were two issues:

  • your cluster was full
  • increasing required resources needed by apps (instead of cluster) led to you not being able to deploy this app since there was not enough resources left on the cluster.

Your cluster seems to be ok with 3 nodes for now, but further multiple deployments might required it to scale up to more. Anyways, you should be good now.

Please let us know if you need further help :slight_smile:

Have a good weekend !

Wow, thanks a lot ! :grinning:

Actually I thought the update action was to update the version of Kube. My bad :confused:
Thanks for the explanation :slight_smile:

Have a good weekend too :slight_smile:

Actually I thought the update action was to update the version of Kube. My bad :confused:

No worries, I will share this point with product team (cc @Alessandro_Carrano @Florian_Lepont), I do agree it’s a bit confusing :slight_smile:

Cheers !

1 Like

Hey guys !

All my deployments fail to complete due to a lack of resources. I followed the guidelines discussed here, but still, deployments fail with the same error message as in @Tactless7 message.
Any idea on what could cause this ?

Hello @polive106 !

Hopefully we are working on infra logs and those are coming making things clearer :slight_smile:

In your case, there is a vCPU capacity quota preventing you from moving further:

AsgInstanceLaunchFailures: Could not launch On-Demand Instances. VcpuLimitExceeded - You have requested more vCPU capacity than your current vCPU limit of 32 allows for the instance bucket that the specified instance type belongs to. Please visit http://aws.amazon.com/contact-us/ec2-request to request an adjustment to this limit.

You should open a ticket to AWS asking them to increase this quotas.

Please let us know if it unlocked you.

Cheers

Hi,

Thanks @bchastanier for your quick answer !

I opened a ticket on AWS side. Any suggestion of a quick fix before AWS limits are updated ?

Cheers

Hi Pierre,

Unfortunately, it’s hard to work around those limits. They should respond quite quickly (hopefully) :crossed_fingers: