ECR 401 Error and resources on EC2 (K3S) cluster

Relevant information to this issue:

ISSUE

My original issue was covered in the troubleshooting documentation but I missed a step and since then everything I have done has just made things worse.

Here is a summary of the steps I took and the results I got at each.

  1. I noticed this morning that the auto deployment for one of my applications had been failing with an error “0/1 nodes are available: 1 Insufficient cpu”
  2. I tried Redeploying the application with no success.
  3. After checking the documentation, I updated the resources assigned but didn’t notice that I needed to Stop the service first. The result was the same failure “0/1 nodes are available: 1 Insufficient cpu”
  4. I then tried to Stop the service but it too failed with an error “Pause of Application failed but we rollbacked it to previous safe/running version !”
  5. I then tried stopping the cluster which just gave me a UI error indicating something along the lines of “STOP STOP was not supported for AWS”
  6. So I tried running Update on the cluster, which completed successfully but now all the applications fail to start with a “Cannot pull the image for your container” and an earlier message that indicates a 401 ERROR accessing ECR.

I had the ECR error in the past and it required a restart of some application by the Qovery team that was in a bad state.

Can someone from the Qovery team either restart that application or preferably, provide me the steps to resolve the ECR error so if this happens again, I can resolve it myself in the future?

Also, how can I get out of the mess I created by not stopping the service before adjusting the resources?

Hello @bsimakis,
I’m taking a look, the ECR issue is going to be resolved soon

I just relaunched the deployment of your application that was failing to pull image and it succeeded: Qovery

Concerning your application that results in Stop Error, no service is running currently on the cluster and it seems to lead on an error on our side, you can redeploy it to remove inconsistency