Hi, I experience from time to time failure during Job deployment with the error “Either it couldn’t be executed correctly after 0 retries or its execution didn’t finish after 5.00 minutes.”.
It seems the related application code is never executed (never fully deployed, no logs in Live logs), and the Pod stays in STARTING state.
From the deployment logs, it seems there is an issue with the number of pods available. I have this message “0/5 nodes are available: 2 Insufficient memory, 5 Insufficient cpu. preemption: 0/5 nodes are available: 5 No preemption victims found for incoming pod…”.
How can I act on this? Should I do/change something?
Do you know how often it happens? Does it happen on a new application update or preview environment (I don’t know if you use this feature)? Can you please share your cluster configuration?
You can add nodes on a EKS cluster, but not on an EC2 (K3S) one.
Nonetheless you can use another instance with higher CPU / RAM according to your needs.
Thank you @Melvin_Zottola for your answer. I have an t3a.large (2CPU - 8GB RAM - AMD64). What I have to choose to improve perf without it costing too much?
I would say you can take the t3a.xlarge that would double the cpu / ram resources (it would cost you the double too, but I don’t see any good alternative)