I deleted all EKS nodes as part of SOC2 tests

botslist · October 13, 2023, 7:28am

Hello, as part of our SOC2 compliance DR tests, I had to delete all worker nodes from our Kubernetes cluster but when I did so it seems that the control plane nodes were also deleted and are not coming back. Is it possible that you’re scheduling pods also on the nodes that you’re using as control plane nodes? Also, can you please fix our Playground cluster cause now it’s broken for good, I fear. I’m including the set of steps I did, for your reference. Let me know if you need any further explanation.

Notes:

I just tried changing the max number of nodes to force a redeployment to see if it would fix the problem. The cluster says it’s “Updating” and has the spinner going
looking at the logs it seems that the Helm deployment is failing while trying to UPGRADE the vertical-pod-autoscaler among other things
put back max nodes to 10 to force another deployment… but I don’t have high hopes looking at what I see in the logs.

Pierre_Mavro · October 13, 2023, 7:30am

Hi ,Your questions are completely legit. If you took the Kubernetes documentation or followed a guide on some classical Kubernetes operations and did not get the expected result, it’s because of some EKS specificities/knowledge you don’t get. I’m going to try to give you more input to quickly fix your issue and help you to move on:

“I had to delete all worker nodes”: It’s not how things are done on EKS. May be you worked with non managed Kubernetes in the past, but on EKS, an EC2 instance is linked to a cluster and the correlation between a node and an EC2 is not that strong. What happened here, is you deleted the node FROM kubernetes, but the EC2 is still alive. While I guess you expected it to be deleted. So here, from an AWS POV, everything is “normal”. You just can’t do that. To recover from this situation, please go on the AWS console, and delete the nodegroup of your cluster. It will delete all the EC2 instances you expected to be removed from the Kubernetes cluster. Then re-run a Qovery cluster deployment, new nodes will be deployed and your services will be back. I advise you to look at the EKS documentation and contact the support for more info, you’ll be able to have more internal feedback. Qovery can’t do much more to help you here, as it’s how EKS works and your SOC2 validation tests are on the AWS stack, not the Qovery one.
“but when I did so it seems that the control plane nodes were also deleted and are not coming back”: from an AWS POV, control plane nodes are the Kubernetes master nodes. Worker nodes != master nodes/control plane. If you were losing the control plane, you were not able to connect to the Kubernetes API at all.

Hope it’s clear. Don’t expect EKS to work as a classical/vanilla Kubernetes cluster. Every Cloud provider having a Kubernetes managed service are not equivalent. Learning those specificities help to use correctly Kubernetes, but I know it’s not obvious unfortunately . Don’t hesitate to contact the AWS support for further more help on your SOC2 preparation, they will be able to give you more insight and context

Topic		Replies	Views
EKS EC2 Increased to 5 Nodes For no Reason AWS	3	749	September 15, 2022
Can't Delete AWS EKS Cluster Questions and Answers qovery , aws , kubernetes	6	28	November 29, 2024
Zero nodes available for deployment with 30 nodes while not fully used Deployment	2	207	March 30, 2024
Failed Job deployment (no node available) Questions and Answers qovery	8	1037	March 25, 2024
Delete error when AWS Resources are deleted from aws console Deployment	8	531	March 25, 2024

I deleted all EKS nodes as part of SOC2 tests

Related topics