Why provision NLBs for container databases?

I noticed NLBs were provisioned for container databases. At first, I thought they were used to expose databases to the public but even after I changed our databases to private the NLBs are still there. NLBs are relatively expensive, could they be avoided?

Hi @prki, it makes sense to me and I am going to see with @Pierre_Mavro and @a_carrano if NLB can be removed when switching off public access for a database.

Hi @prki ,

I just ran some tests on my side, and I’m unable to reproduce it. When I move from private to public, the load balancer is destroyed. Did you run a redeploy after applying your changes? On which kind of database are you facing this?

Thanks

I just circled Private → Public → Private again, but the NLBs are still there. The deployment of the change is really fast, it seems it’s only changing the container configuration. It’s a Postgres v13 container database. We have 2 of these in the staging env and both have NLBs even though the current configuration is private (they were public initially).

Any ideas why it doesn’t work as expected for us?

Hi @prki , I let @Pierre_Mavro or someone from our engineering team respond as soon as he can on that subject.

Hi @prki ,

Can you please make a screenshot of NLBs you see with their respective tags please?

Thanks

Hi @prki

We are investigating the issue as we managed to reproduce it on our side.
We will let you know when this is fixed.

Hi @prki ,

It looks like the issue is on AWS side and happens randomly, this is why I didn’t manage to reproduce it and Bilel could. I’m going to open a ticket on their support.

To give you more info, here is what I observed:

  1. I created a PostgreSQL instance in private mode (so no LB)
  2. I switched from private to public. So I can see the load balancer present
  3. I switch it back to private. On kubernetes side, it’s done and I got AWS messages on Kubernetes saying it has been deleted:
  Type    Reason                Age   From                Message
  ----    ------                ----  ----                -------
  Normal  Type                  57s   service-controller  ClusterIP -> LoadBalancer
  Normal  EnsuringLoadBalancer  57s   service-controller  Ensuring load balancer
  Normal  EnsuredLoadBalancer   55s   service-controller  Ensured load balancer
  Normal  Type                  4s    service-controller  LoadBalancer -> ClusterIP
  Normal  DeletedLoadBalancer   4s    service-controller  Deleted load balancer
  1. The load balancer is still present on AWS and absents on Kubernetes. You were right!

Please let me see with AWS, I’ll get back to you with a fix.

Thanks again for this case

1 Like

Hi @prki ,

AWS support validated the issue.

TL;DR We have to make an important migration to fix this issue, leading to some small downtimes on all deployed Kubernetes clusters. This won’t be a trivial fix, requiring several tests, so don’t expect it to be fixed in the coming days. However, it’s an important one, and already started to work on it. FIX ETA: July

Here is the original answer:

I further deep dive into the issue and I can confirmed that the issue is still exist even I tried to run with Amazon EKS 1.22 which bring my attention. Then I tried to look into CloudWatch Log Insights and I can see the same behavior as issue [1] described by executing the following query.

------------------------------------------------------------------------------------------------------------------------------
fields @timestamp, verb, requestURI, responseObject.spec.type, responseObject.status.loadBalancer.ingress.0.hostname, @message
| filter @message like /K8S-SERVICE-NAME/ or @message like /NLB-RESOURCE-NAME/
| filter userAgent not like /kubectl/ # filter out log lines that generated by "kubectl"
------------------------------------------------------------------------------------------------------------------------------

From the log lines, you will find that the Kubernetes removed the annotation for NLB during the transition of “LoadBalancer” → “ClusterIP”. So it hit the issue [1] and you may find the source code at link [2] for Kubernetes-1.20 and link [3] for Kubernetes-1.22.

------------------------------------------------------------------------------------------------------------------------------
// EnsureLoadBalancerDeleted implements LoadBalancer.EnsureLoadBalancerDeleted.
func (c *Cloud) EnsureLoadBalancerDeleted(ctx context.Context, clusterName string, service *v1.Service) error {
    if isLBExternal(service.Annotations) {
        return nil
    }
    loadBalancerName := c.GetLoadBalancerName(ctx, clusterName, service)

    if isNLB(service.Annotations) { // <-------- the annotation have been removed, so it would not goes into this block.
        ...
    }
    ...
------------------------------------------------------------------------------------------------------------------------------

However, Amazon EKS inherit the same behavior that upstream Kubernetes does and the in-tree source code is not we can changed. I recommend you to post a “+1” or write some comment to the issue [1] to see how upstream Kuberneetes developer response. But one thing you might be noticed is that both “in-tree” controller is located under the folder “legacy-cloud-providers/…”, I’m not sure if upstream Kubernetes will refine this function or not. Since it have been marked as “legacy” back in 2019[4][5], it have been years.

As alternative solution, I tried to switch my Load Balancer Controller from Kubernetes “in-tree” Load Balancer Controller to AWS Load Balancer Controller[6][7], doing the same test again and I can confirmed there’s no issue while switching service type between “ClusterIP” and “LoadBalancer”. Here’s the log line while I tried to switch from “LoadBalancer” back to “ClusterIP” while using “AWS Load Balancer Controller”.

------------------------------------------------------------------------------------------------------------------------------
{"level":"info","ts":XXX,"logger":"controllers.service","msg":"deleting loadBalancer","arn":"arn:aws:elasticloadbalancing:us-east-1:AWS_ACCOUNT_ID:loadbalancer/net/k8s-..."}

{"level":"info","ts":XXX,"logger":"controllers.service","msg":"deleted loadBalancer","arn":"arn:aws:elasticloadbalancing:us-east-1:AWS_ACCOUNT_ID:loadbalancer/net/k8s-..."}

{"level":"info","ts":XXX,"logger":"controllers.service","msg":"deleting targetGroupBinding","targetGroupBinding":{"namespace":"default","name":"k8s-..."}}

{"level":"info","ts":XXX,"logger":"controllers.service","msg":"deleted targetGroupBinding","targetGroupBinding":{"namespace":"default","name":"k8s-..."}}

{"level":"info","ts":XXX,"logger":"controllers.service","msg":"deleting targetGroup","arn":"arn:aws:elasticloadbalancing:..."}

{"level":"info","ts":XXX,"logger":"controllers.service","msg":"deleted targetGroup","arn":"arn:aws:elasticloadbalancing:..."}

{"level":"info","ts":XXX,"logger":"controllers.service","msg":"successfully deployed model","service":{"namespace":"default","name":"..."}}
------------------------------------------------------------------------------------------------------------------------------

In summary, what I suggest you to do next would be either report the issue directly to upstream Kubernetes [1] or try to switch your Load Balancer Controller from “in-tree” to “AWS Load Balancer Controller”[6] and follow the NLB setup guidance[7] to finish the setup. I believe after switching to “AWS Load Balancer Controller” should resolved the issue that you observed.

I hope provided info helps you understand the state of “in-tree” controller support and thanks again for reporting the issue.

Hi,

Just to let you know, I’m currently working on a workaround as making a fix mergeable to Kubernetes upstream, then released by AWS, can take a while.

Moving to ALBs will be the next step, but I’m working on the automatic cleaning of NLBs. It should be in test next week and released the week after.

Pierre

Hi @Pierre_Mavro, are you going to move to ALBs for everything or just the databases? We’ve used ALBs in the past and had to make the opposite move to NLBs to meet our low latency requirements. I hope NLBs stay supported at least as a config option.

1 Like

To be honest, I need to dig a little bit more.

Personnally, I don’t like that much ALB like you and prefer NLB. But I’ve seen options to set an ALB acting as layer 4 load balancer and not layer 7. This should solve the issues you’ve been facing.

First, I’m going to finish an release the patch to delete AWS side deleted NLB kubernetes side, then I’ll make tests. What is sure is I don’t want ALB layer 7 for at least reasons you’ve mentioned.

Hi,

A fix is ongoing and will be released in the week. In the meantime, I just see that AWS has fixed the issue recently. Our fix will ensure old pre-provisioned and may be forgotten NLB will be correctly deleted.

Sorry for the delay of this fix

1 Like

Thanks for the update @Pierre_Mavro.

1 Like

The fix has been rolled out :slight_smile: . Thanks again @prki for the report; this was a vicious one!