Hello @juehai,
Thanks for your inputs here. I can explain when happen on our end.
The lambda AWSWesleyClusterManagerLambda seems to come from AWS control plane and seems to be used by AWS to monitor EKS deployments.
Last week we released a change in our stack allowing to fully move Qovery stack to role base instead of user base authentication, see this thread.
To rollout it out, a cluster update was needed in order to:
- Remove
iam-eks-user-mapperuser from IAM - Create new role for
iam-eks-user-mapperin IAM - Deploy the new app allowing to grant IAM users from
Adminsgroup access to EKS cluster
As to whatās going on on your cluster, I guess it boils down to EKS aws-auth configmap being empty or at least not having stephaneās account anymore hence we cannot access to it via credentials from this account.
The only root cause I can see of this to happen is if somehow there was an issue during your cluster update between step 1 and 2 (aws user account in IAM for the tool have been removed but the new app hasnāt been deployed) the iam-eks-user-mapper app is still the old one trying to use the old user to update the aws-auth configmap but it fails.
If so, if I had access to the cluster via the cluster creator (what I usually always have) I would manually add stephaneās user to aws-auth, then triggers a cluster update again which should fix the issue.
The issue here is that I donāt understand why stephaneās account access is completely lost, usually if such issue happen, I can still connect to the cluster using clusterās creator credentials (the ones you provided to Qovery), which makes me think something else happened somehow.
So now, there is 3 solutions I can think of:
- (if the issue comes from the case described above): if you can use a master user from your AWS account (having all rights), you can try to trigger this command which is suppose to add stephaneās account to cluster
aws-authconfigmap;
Let me now once this is done so I can try to connect to the cluster and check its status.
eksctl create iamidentitymapping --region eu-west-1 --cluster qovery-z16cd1bde --arn arn:aws:iam::594084547872:user/stephane@nonfungiblelabs.xyz --group system:masters
-
Open a case to AWS support with your cluster info, asking why we cannot connect to the cluster anymore and ask them if they can add this user back to access it. There might also be an error on their side triggered by the cluster update somehow. In anycase this can help.
-
Create a new cluster via Qovery and move your workload to it using environment clone and target the new cluster.
I really think there is something fishy going on here, so I would try solution 1 and then fallback to AWS support to get more insights. Solution 3 should be last call.
Please keep me posted so I can help you further.
Cheers