Relevant information to this issue:
- Programming language and version: NodeJS 20
- Cluster Type: EC2 (K3S)
- Link to your application - https://console.qovery.com/organization/f620920f-b94b-477d-9d76-b6c4ede7cafb/project/8399b76f-cbca-420f-91e6-33dd5dc2cc87/environments/general
ISSUE
My original issue was covered in the troubleshooting documentation but I missed a step and since then everything I have done has just made things worse.
Here is a summary of the steps I took and the results I got at each.
- I noticed this morning that the auto deployment for one of my applications had been failing with an error “0/1 nodes are available: 1 Insufficient cpu”
- I tried Redeploying the application with no success.
- After checking the documentation, I updated the resources assigned but didn’t notice that I needed to Stop the service first. The result was the same failure “0/1 nodes are available: 1 Insufficient cpu”
- I then tried to Stop the service but it too failed with an error “Pause of Application failed but we rollbacked it to previous safe/running version !”
- I then tried stopping the cluster which just gave me a UI error indicating something along the lines of “STOP STOP was not supported for AWS”
- So I tried running Update on the cluster, which completed successfully but now all the applications fail to start with a “Cannot pull the image for your container” and an earlier message that indicates a 401 ERROR accessing ECR.
I had the ECR error in the past and it required a restart of some application by the Qovery team that was in a bad state.
Can someone from the Qovery team either restart that application or preferably, provide me the steps to resolve the ECR error so if this happens again, I can resolve it myself in the future?
Also, how can I get out of the mess I created by not stopping the service before adjusting the resources?