Pod stuck in pending status

Ethan_Huang · October 1, 2024, 5:05pm

Hi

We just found that many of our services are down, and in Qovery, they are stuck in the starting status. When I checked the Kubernetes status, it seems that those pods are in a pending state. Below is the status of one pending pod. It looks like there might be an issue with Karpenter. Could you help take a look?
https://console.qovery.com/organization/b4271b12-477a-4b41-a274-9c9cba8043e8/project/42fc429e-4f98-40fe-b1c5-e6e0a348a172/environment/cde278a3-a413-40f7-b203-ad070309f579/application/d96dc289-1c98-4012-a347-b8f914876979/general

Name:                 app-zd96dc289-portal-web-7484947765-pbqxn
Namespace:            zcde278a3-prod
Priority:             1000

    Liveness:   tcp-socket :8000 delay=30s timeout=5s period=10s #success=1 #failure=3
    Readiness:  tcp-socket :8000 delay=15s timeout=1s period=10s #success=1 #failure=5
    Environment:
                Optional: false
Conditions:
  Type           Status
  PodScheduled   False
Volumes:         <none>
QoS Class:       Guaranteed
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason            Age                From               Message
  ----     ------            ----               ----               -------
  Warning  FailedScheduling  53s (x3 over 63s)  default-scheduler  0/7 nodes are available: 1 Insufficient cpu, 2 node(s) had untolerated taint {eks.amazonaws.com/compute-type: fargate}, 4 node(s) had untolerated taint {nodepool/stable: }. preemption: 0/7 nodes are available: 1 No preemption victims found for incoming pod, 6 Preemption is not helpful for scheduling..

Pierre_Mavro · October 1, 2024, 5:12pm

Hi,

I’m looking into it.

Pierre

Pierre_Mavro · October 1, 2024, 5:16pm

Karpenter was in a bad shape, this is why new nodes couldn’t come. Fixing it manually and we’ll investigate tomorrow on the why. It looks like a lack of resources on Karpenter pods after a quick look.

Pierre_Mavro · October 1, 2024, 5:22pm

You should see your apps coming back now. Sorry for the inconvenience, the team will dig into it tomorrow. In the meantime please do not redeploy your cluster. Thanks

Ethan_Huang · October 1, 2024, 5:38pm

thanks for quick response, everything works properly now

Pierre_Mavro · October 1, 2024, 7:45pm

Great. The issue has been identified. A fix will be released in the next 2 days

Pierre_Mavro · October 2, 2024, 10:20am

The fix has been deployed. You can make changes on your cluster now if you want.

Pierre

system · October 9, 2024, 10:20am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Why would my service show starting in one place and running in another? Deployment	12	35	September 9, 2024
Deploy failed, probe failed, unable to connect using qovery shell Deployment	17	1318	August 18, 2022
Possible issue when deploying datadog cluster agent in karpenter Questions and Answers	5	59	October 28, 2024
ShortmeURL - Deployment Failed With Error message: ExitStatusError ExitStatus unix_wait_status 256 Questions and Answers qovery	11	1106	March 25, 2024
Trying Qovery Demo, Logs are empty and deployment stuck in "queueing" status Deployment	10	29	November 26, 2024

Pod stuck in pending status

Related topics