Failed Job deployment (no node available)

Hi, I experience from time to time failure during Job deployment with the error “Either it couldn’t be executed correctly after 0 retries or its execution didn’t finish after 5.00 minutes.”.

It seems the related application code is never executed (never fully deployed, no logs in Live logs), and the Pod stays in STARTING state.

From the deployment logs, it seems there is an issue with the number of pods available. I have this message “0/5 nodes are available: 2 Insufficient memory, 5 Insufficient cpu. preemption: 0/5 nodes are available: 5 No preemption victims found for incoming pod…”.

How can I act on this? Should I do/change something?

Hi @clemg ,

Do you know how often it happens? Does it happen on a new application update or preview environment (I don’t know if you use this feature)? Can you please share your cluster configuration?

Hello,

This is most likely because your cluster reached its max node size, and your job can’t start because the cluster node size can’t be increased.

Take a look at your cluster ressources configuration, and be sure to increase its max node size to a greater value.

Once you have made the change, don’t forget to re-deploy your cluster, to make the change live.

Hello everyone! I have the same node invalid issue with a service:
Pod app-z639f1313-lb-e-marketing-6f57547869-9vg9w is STARTING

┃0/1 nodes are available: 1 Insufficient cpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod..

After reading this post, I wanted to increase the max nodes size in the cluster resources but I can’t change number of nodes available

I have the issue in staging environment. Could this be the problem? Should I get a more powerful server?

Hello @Geoffrey_S ,

You can add nodes on a EKS cluster, but not on an EC2 (K3S) one.
Nonetheless you can use another instance with higher CPU / RAM according to your needs.

Thank you @Melvin_Zottola for your answer. I have an t3a.large (2CPU - 8GB RAM - AMD64). What I have to choose to improve perf without it costing too much?

I would say you can take the t3a.xlarge that would double the cpu / ram resources (it would cost you the double too, but I don’t see any good alternative)

Ok I’m going to look for that.
Thank you for your help