I think there is a confusion here. Indeed, your cluster seems to lack of resources, but it’s due to nodes.
What you try to do if to increase application resources whereas you want to add more nodes.
From your settings I see you have scaler from 3-4 nodes, looking at the message it says that all your nodes are full.
I don’t know if your app requires that much CPU, but I advise you to scale it down to default CPU (if it doesn’t requires it of course).
Can you try to add one more node to you cluster (Clusters => Resources => Nodes) ?
If not possible on your end, you can also delete some env / apps on your cluster to free up some resources.
Thanks for the answer, it made me look a bit further into configuration.
I choose an amount of 5-8 nodes and remove a environment recently created in my cluster. Even with theses changes, it didn’d change anything. I cannot deploy the new version of my api, and I have the same errors :
Condition not met to start the container: Ready -> Unknown(Some("ContainersNotReady")): containers with unready status: [app-z403746c7]
Condition not met to start the container: ContainersReady -> Unknown(Some("ContainersNotReady")): containers with unready status: [app-z403746c7]
terminated state exit code: 1
2022-05-21T12:51:45Z Warning Unhealthy: Liveness probe failed: dial tcp 10.0.50.143:3000: connect: connection refused
2022-05-21T12:49:39Z Warning Unhealthy: Readiness probe failed: dial tcp 10.0.50.143:3000: connect: connection refused
2022-05-21T13:29:56Z Warning Unhealthy: Readiness probe failed: dial tcp 10.0.56.251:3000: connect: connection refused
2022-05-21T13:27:36Z Warning Unhealthy: Liveness probe failed: dial tcp 10.0.56.251:3000: connect: connection refused
2022-05-21T12:36:01Z Warning FailedScheduling: 0/4 nodes are available: 2 Insufficient memory, 4 Insufficient cpu.
2022-05-21T12:37:13Z Warning FailedScheduling: skip schedule deleting pod: z59b9994d-z6311ae3b/app-z403746c7-94f68c48f-44jpg
Also, when I look into the cluster on AWS (with the root user) I can’t see any nodes. But the previous versions of my front and api apps are running and rolled back when deployment fails
So, indeed, your cluster was set to have from 5 to 8 nodes, but it doesn’t seems to have been updated with those changes. Did you update your cluster after this change? If not, it’s normal you still had the error since changes hasn’t been applied to your cluster.
I changed your cluster setup to have from 3 to 8 nodes (you set it to have 5 minimum but 3 is ok, autoscaler will do its magic).
Redeployed your cluster and you apps, everything looks good now.
So bottom line, I guess there were two issues:
your cluster was full
increasing required resources needed by apps (instead of cluster) led to you not being able to deploy this app since there was not enough resources left on the cluster.
Your cluster seems to be ok with 3 nodes for now, but further multiple deployments might required it to scale up to more. Anyways, you should be good now.
All my deployments fail to complete due to a lack of resources. I followed the guidelines discussed here, but still, deployments fail with the same error message as in @Tactless7 message.
Any idea on what could cause this ?
Hopefully we are working on infra logs and those are coming making things clearer
In your case, there is a vCPU capacity quota preventing you from moving further:
AsgInstanceLaunchFailures: Could not launch On-Demand Instances. VcpuLimitExceeded - You have requested more vCPU capacity than your current vCPU limit of 32 allows for the instance bucket that the specified instance type belongs to. Please visit http://aws.amazon.com/contact-us/ec2-request to request an adjustment to this limit.
You should open a ticket to AWS asking them to increase this quotas.
I finally my AWS vCPUs quotas increased. But still, I cannot deploy an app in my production cluster because of this error from my deployment logs : 0/6 nodes are available: 6 Insufficient cpu.
What am i missing here ? Any configuration error on my end ?
AsgInstanceLaunchFailures: Could not launch On-Demand Instances. VcpuLimitExceeded - You have requested more vCPU capacity than your current vCPU limit of 136 allows for the instance bucket that the specified instance type belongs to.
Can you ask AWS to increase your quota? (follow this link)