Canceled deployment every time I try to redeploy them

bchastanier · August 18, 2023, 8:59am

Thanks for your inputs here. I can explain when happen on our end.

The lambda AWSWesleyClusterManagerLambda seems to come from AWS control plane and seems to be used by AWS to monitor EKS deployments.

Last week we released a change in our stack allowing to fully move Qovery stack to role base instead of user base authentication, see this thread.
To rollout it out, a cluster update was needed in order to:

Remove iam-eks-user-mapper user from IAM
Create new role for iam-eks-user-mapper in IAM
Deploy the new app allowing to grant IAM users from Admins group access to EKS cluster

As to what’s going on on your cluster, I guess it boils down to EKS aws-auth configmap being empty or at least not having stephane’s account anymore hence we cannot access to it via credentials from this account.
The only root cause I can see of this to happen is if somehow there was an issue during your cluster update between step 1 and 2 (aws user account in IAM for the tool have been removed but the new app hasn’t been deployed) the iam-eks-user-mapper app is still the old one trying to use the old user to update the aws-auth configmap but it fails.
If so, if I had access to the cluster via the cluster creator (what I usually always have) I would manually add stephane’s user to aws-auth, then triggers a cluster update again which should fix the issue.

The issue here is that I don’t understand why stephane’s account access is completely lost, usually if such issue happen, I can still connect to the cluster using cluster’s creator credentials (the ones you provided to Qovery), which makes me think something else happened somehow.

So now, there is 3 solutions I can think of:

(if the issue comes from the case described above): if you can use a master user from your AWS account (having all rights), you can try to trigger this command which is suppose to add stephane’s account to cluster aws-auth configmap;
Let me now once this is done so I can try to connect to the cluster and check its status.

  eksctl create iamidentitymapping --region eu-west-1 --cluster qovery-z16cd1bde --arn arn:aws:iam::594084547872:user/stephane@nonfungiblelabs.xyz --group system:masters

Open a case to AWS support with your cluster info, asking why we cannot connect to the cluster anymore and ask them if they can add this user back to access it. There might also be an error on their side triggered by the cluster update somehow. In anycase this can help.
Create a new cluster via Qovery and move your workload to it using environment clone and target the new cluster.

I really think there is something fishy going on here, so I would try solution 1 and then fallback to AWS support to get more insights. Solution 3 should be last call.

Please keep me posted so I can help you further.

Cheers

Topic		Replies	Views
Aws users created by qovery Questions and Answers	18	769	March 25, 2024
Kube System Pods are down in Qovery Questions and Answers qovery	11	1627	July 13, 2022
Cant redeploy Qovery managed cluster on Scaleway Questions and Answers qovery	4	41	July 17, 2024
Unable to deploy - 403 Error accessing ECS Deployment qovery , aws	7	375	July 7, 2023
Airbyte 1.1.0 installation on Qovery managed cluster broke again Deployment	31	136	November 29, 2024

Canceled deployment every time I try to redeploy them

Related topics