Why provision NLBs for container databases?

Hi @prki ,

AWS support validated the issue.

TL;DR We have to make an important migration to fix this issue, leading to some small downtimes on all deployed Kubernetes clusters. This won’t be a trivial fix, requiring several tests, so don’t expect it to be fixed in the coming days. However, it’s an important one, and already started to work on it. FIX ETA: July

Here is the original answer:

I further deep dive into the issue and I can confirmed that the issue is still exist even I tried to run with Amazon EKS 1.22 which bring my attention. Then I tried to look into CloudWatch Log Insights and I can see the same behavior as issue [1] described by executing the following query.

------------------------------------------------------------------------------------------------------------------------------
fields @timestamp, verb, requestURI, responseObject.spec.type, responseObject.status.loadBalancer.ingress.0.hostname, @message
| filter @message like /K8S-SERVICE-NAME/ or @message like /NLB-RESOURCE-NAME/
| filter userAgent not like /kubectl/ # filter out log lines that generated by "kubectl"
------------------------------------------------------------------------------------------------------------------------------

From the log lines, you will find that the Kubernetes removed the annotation for NLB during the transition of “LoadBalancer” → “ClusterIP”. So it hit the issue [1] and you may find the source code at link [2] for Kubernetes-1.20 and link [3] for Kubernetes-1.22.

------------------------------------------------------------------------------------------------------------------------------
// EnsureLoadBalancerDeleted implements LoadBalancer.EnsureLoadBalancerDeleted.
func (c *Cloud) EnsureLoadBalancerDeleted(ctx context.Context, clusterName string, service *v1.Service) error {
    if isLBExternal(service.Annotations) {
        return nil
    }
    loadBalancerName := c.GetLoadBalancerName(ctx, clusterName, service)

    if isNLB(service.Annotations) { // <-------- the annotation have been removed, so it would not goes into this block.
        ...
    }
    ...
------------------------------------------------------------------------------------------------------------------------------

However, Amazon EKS inherit the same behavior that upstream Kubernetes does and the in-tree source code is not we can changed. I recommend you to post a “+1” or write some comment to the issue [1] to see how upstream Kuberneetes developer response. But one thing you might be noticed is that both “in-tree” controller is located under the folder “legacy-cloud-providers/…”, I’m not sure if upstream Kubernetes will refine this function or not. Since it have been marked as “legacy” back in 2019[4][5], it have been years.

As alternative solution, I tried to switch my Load Balancer Controller from Kubernetes “in-tree” Load Balancer Controller to AWS Load Balancer Controller[6][7], doing the same test again and I can confirmed there’s no issue while switching service type between “ClusterIP” and “LoadBalancer”. Here’s the log line while I tried to switch from “LoadBalancer” back to “ClusterIP” while using “AWS Load Balancer Controller”.

------------------------------------------------------------------------------------------------------------------------------
{"level":"info","ts":XXX,"logger":"controllers.service","msg":"deleting loadBalancer","arn":"arn:aws:elasticloadbalancing:us-east-1:AWS_ACCOUNT_ID:loadbalancer/net/k8s-..."}

{"level":"info","ts":XXX,"logger":"controllers.service","msg":"deleted loadBalancer","arn":"arn:aws:elasticloadbalancing:us-east-1:AWS_ACCOUNT_ID:loadbalancer/net/k8s-..."}

{"level":"info","ts":XXX,"logger":"controllers.service","msg":"deleting targetGroupBinding","targetGroupBinding":{"namespace":"default","name":"k8s-..."}}

{"level":"info","ts":XXX,"logger":"controllers.service","msg":"deleted targetGroupBinding","targetGroupBinding":{"namespace":"default","name":"k8s-..."}}

{"level":"info","ts":XXX,"logger":"controllers.service","msg":"deleting targetGroup","arn":"arn:aws:elasticloadbalancing:..."}

{"level":"info","ts":XXX,"logger":"controllers.service","msg":"deleted targetGroup","arn":"arn:aws:elasticloadbalancing:..."}

{"level":"info","ts":XXX,"logger":"controllers.service","msg":"successfully deployed model","service":{"namespace":"default","name":"..."}}
------------------------------------------------------------------------------------------------------------------------------

In summary, what I suggest you to do next would be either report the issue directly to upstream Kubernetes [1] or try to switch your Load Balancer Controller from “in-tree” to “AWS Load Balancer Controller”[6] and follow the NLB setup guidance[7] to finish the setup. I believe after switching to “AWS Load Balancer Controller” should resolved the issue that you observed.

I hope provided info helps you understand the state of “in-tree” controller support and thanks again for reporting the issue.