Canceled deployment every time I try to redeploy them

Hello @juehai,

Thanks for your inputs here. I can explain when happen on our end.

The lambda AWSWesleyClusterManagerLambda seems to come from AWS control plane and seems to be used by AWS to monitor EKS deployments.

Last week we released a change in our stack allowing to fully move Qovery stack to role base instead of user base authentication, see this thread.
To rollout it out, a cluster update was needed in order to:

  1. Remove iam-eks-user-mapper user from IAM
  2. Create new role for iam-eks-user-mapper in IAM
  3. Deploy the new app allowing to grant IAM users from Admins group access to EKS cluster

As to what’s going on on your cluster, I guess it boils down to EKS aws-auth configmap being empty or at least not having stephane’s account anymore hence we cannot access to it via credentials from this account.
The only root cause I can see of this to happen is if somehow there was an issue during your cluster update between step 1 and 2 (aws user account in IAM for the tool have been removed but the new app hasn’t been deployed) the iam-eks-user-mapper app is still the old one trying to use the old user to update the aws-auth configmap but it fails.
If so, if I had access to the cluster via the cluster creator (what I usually always have) I would manually add stephane’s user to aws-auth, then triggers a cluster update again which should fix the issue.

The issue here is that I don’t understand why stephane’s account access is completely lost, usually if such issue happen, I can still connect to the cluster using cluster’s creator credentials (the ones you provided to Qovery), which makes me think something else happened somehow.

So now, there is 3 solutions I can think of:

  1. (if the issue comes from the case described above): if you can use a master user from your AWS account (having all rights), you can try to trigger this command which is suppose to add stephane’s account to cluster aws-auth configmap;
    Let me now once this is done so I can try to connect to the cluster and check its status.
  eksctl create iamidentitymapping --region eu-west-1 --cluster qovery-z16cd1bde --arn arn:aws:iam::594084547872:user/ --group system:masters
  1. Open a case to AWS support with your cluster info, asking why we cannot connect to the cluster anymore and ask them if they can add this user back to access it. There might also be an error on their side triggered by the cluster update somehow. In anycase this can help.

  2. Create a new cluster via Qovery and move your workload to it using environment clone and target the new cluster.

I really think there is something fishy going on here, so I would try solution 1 and then fallback to AWS support to get more insights. Solution 3 should be last call.

Please keep me posted so I can help you further.


1 Like