[AWS][Migration] Moving from NLB controller to ALB controller

Only for customers using the Qovery Managed clusters on AWS

Dear customer,

We are excited to inform you that in the upcoming days, we will integrate the AWS Application Load Balancers (ALB) Controller into our product, adding more network control and features.

:warning: :warning: The rollout of this feature will require a migration on your cluster to remove the old Network Load Balancers controller (NLB), with a possible impact on your applications. Check the rest of the article for more information.:warning: :warning:

Context

At Qovery, we initially started with Kubernetes’ built-in Network Load Balancer controller (NLB). It was the best choice at the beginning of our company since it simplified a lot of things (If you are interested, we have described all the reasons in our blog post here).
Over the past weeks, we have been working to get rid of this legacy part and integrate the ALB controller.

:information_source: We are not migrating from NLB to ALB, we will still be using NLB under the hood. What is changing is the Kubernetes controller that we use to manage the load balancers on AWS

Benefits of Activating the ALB Controller

  • Reduced Downtime: The ALB controller helps decrease the downtime for some applications during updates.
  • Improved IP Forwarding: The original IP addresses are forwarded directly to your application, rather than the load balancer’s IP, providing enhanced transparency and traceability.
  • We will soon add other functionalities that are available only on applications using the ALB controller.

ALB controller, the default choice for new clusters

The ALB controller feature will be enabled by default for all new clusters, ensuring that you benefit from its advantages right from the start.

Migrating an existing application to ALB

We encourage you to activate this feature as soon as you can to take advantage of the benefits listed above.
Since the switch creates a small downtime (see sections below), we will let you decide whenever you want to apply this change.
Test the switch on a dev/staging cluster before applying this change on your production cluster.

If no action is taken from you, we will force the migration to the ALB by the end of October 2024.

If you have any questions or need assistance with the migration process, please do not hesitate to contact our support team or comment on this post.

Migration and Downtime

Activating the ALB controller involves a migration process with a maximum expected downtime of 10 minutes. This downtime is necessary because the current load balancer must be deleted and replaced as per AWS requirements. We strongly advise against enabling this advanced setting during your production hours to minimize any impact on your operations.

How to migrate

:warning: WARNING: as described above, a downtime is expected during this migration :warning:

  1. Through the advanced settings of your cluster, you can activate the ALB by changing the value of the advanced settings aws.eks.enable_alb_controller.

Update 09/12/2024: the enabled alb flag will be active the 09/16/2024 for non productions cluster only.
Update 09/17/2024: the flag has been released and is available only for non-production clusters. You can now start testing the ALB controller on your cluster!

  1. Once the value is updated, you will need to redeploy your cluster to apply the change.
  2. All your services exposed on an HTTP port will be migrated. Others like TCP/UDP will have to be redeployed to benefits of the ALB controller.

Note: if you have custom domains, you don’t have nothing especially to do, they will be automatically redirected to the new load balancer.

Thanks
Alessandro

Hey @a_carrano,

Just tried the new aws.eks.enable_alb_controller advanced setting in our development cluster and it didn’t work. I rolledback to the default configuration and even tough our deployments are successful now I am getting these errors in the services with custom domains configured:

Currently all our services running in our development cluster are not reachable. How can I bring them back?

Jorge

Hey @jorgeramirezamora,

Sorry for the confusion here, the today release has been postponed to Monday 09/16 (CF Standard cluster update - 09/12/2024).

For the time being, this flag doesn’t trigger anything behind the scene.
Can you link your service Qovery URL please so I can have a look?
The one which seems to be the one you shared is this one but is stopped.

Cheers

Hey @bchastanier,

Thanks for the update. The services that are not reachable are these two:

Service 1

Service 2

I assume also others services we have in that same cluster are not reachable as well if we turn them on (We have most of them down currently).

Regards

Jorge

Hey @jorgeramirezamora,

I do see your domains validations green now, your services are stopped though, can you let me know if you still face the issue?

Cheers

Hey @bchastanier,

I am sorry, our development services are down outside office time. Forgot to update that deployment rule so you could review… It seems that they are working now. Not sure if this morning deployment fixed it or if what just a matter or Route53 taking too long to update after rolling back to ALB.

Regards

Jorge

1 Like

Great to hear @jorgeramirezamora !
The ALB should be released this Monday, we will update once done.

Cheers

1 Like

The flag has been released and you can now activate the ALB controller on your non-production clusters.

Is this now available to production clusters? When do you expect it to be?