[IMPORTANT] Kubernetes upgrade - moving to 1.24 + 1.25 + 1.26

Hi,

As shared in our roadmap , we have been preparing all the necessary updates to migrate your Kubernetes cluster from the current version (1.23) up to version 1.26. The upgrade to the latest version (currently the 1.27) will be managed just after and we will share with you the info once we are ready.

Most of the preparatory work has been done and we wanted to give you more visibility on the plan we had in mind to migrate all the clusters, including ours :slight_smile:

As usual, we will upgrade first every non-production cluster to give you the chance to verify that everything works fine for your application before upgrading your production cluster. Make sure those are appropriately tagged as such.

This will give you the opportunity to verify that everything works as expected on your services with the new Kubernetes version (see the section below “Does the upgrade have any impact on my services?”).

Please note that the upgrade won’t be a big bang from 1.23 to 1.26, we have to upgrade the clusters through each version.

:warning: If there is any specific reason we should delay the upgrade of your cluster, please fill this form

We will keep updating this post and our status page with all the information about the upgrades.

For any questions, please comment directly within this thread!

Alessandro

Planning

Upgrade 1.23 → 1.24 - COMPLETED :white_check_mark:

  1. Week 12/06/2023:
    – make 1.24 the default version for any new cluster :white_check_mark:
    – Migrate any cluster flagged as non-production in the qovery console (making sure to exclude any clusters having dev, staging, test in their names). :white_check_mark:

  2. Week 26/06/2023:
    – Migrate all the other clusters. :white_check_mark:

Upgrade 1.24 → 1.25 - COMPLETED :white_check_mark:

  1. Week 03/07/2023:
    – make 1.25 the default version for any new cluster :white_check_mark:
    – Migrate any cluster flagged as non-production in the qovery console (making sure to exclude any clusters having dev, staging, test in their names). :white_check_mark:

  2. Week 10/07/2023:
    – Migrate all the other clusters. :white_check_mark:

Upgrade 1.25 → 1.26 - COMPLETED :white_check_mark:

  1. Week 17/07/2023:
    – make 1.26 the default version for any new cluster :white_check_mark:
    – Migrate any cluster flagged as non-production in the qovery console (making sure to exclude any clusters having dev, staging, test in their names). :white_check_mark:

  2. Week 24/07/2023:
    – Migrate all the other clusters. :white_check_mark:

FAQ

Why do we need to upgrade your cluster

Each cloud provider has a limited number of supported Kubernetes versions and Qovery manages for you the upgrades!

More info on the supported Kubernetes versions by cloud provider:

Does the upgrade have any impact on my services?

  1. Services deployed via Qovery

Kubernetes manage the upgrades by automatically creating new nodes with the new version, migrating the pods on the new node and shutting down the old nodes. The upgrade might cause a very small downtime for your applications, if you want to avoid the downtime you should:

  • set at least 2 instances for your applications (within the application settings ) so that at least 1 instance will be available to receive traffic.
  • set the correct liveness/readiness probes (using the health checks section) so that the newly created instances of your service will receive traffic only when ready

Please note that, even outside this migration period, we strongly advise you to apply the points above to ensure no service disruption during the deployment of your applications.

  1. Services deployed by yourself (via a helm chart)

For any service you have installed by yourself, please ensure they are compatible with Kubernetes new versions. You can test them by either creating a new cluster in the new version (when it will be the default one) or testing it on your non-production cluster when it will be upgraded.

I lost access to my service logs or status

We are upgrading the agent running on your cluster to adapt to the new Kubernetes version and changing how it retrieves kubernetes resources.
If you lose access to your service logs or status, the only thing you need to do is re-trigger the deployment of the environment to update the deployed information.
Triggering the deployment of only 1 service is enough to update the environment.
After the deployment is done, you will have again access to your service logs / status without any further action from your part.

4 Likes

Update on 1.24 migration:

All non-production clusters have been migrated to version 1.24. You can now verify that everything works fine on your clusters.

The week of the 26th we will start with the migration of the production clusters.

1.24 is also the default version for any new cluster.

1 Like

which settings are referred here?

set the correct liveness/readiness probes (using the advanced settings ) so that the newly created instances of your service will receive traffic only when ready

Welcome @Abhishek_Kanojia :wave: ,

You can configure this from the web interface in your app settings.

Let me know if you have any other questions

Thank you @rophilogene
Looks like we have not configured this in our web app service, does it mean we will get affected in any way from Kubernetes upgrade?

It means that you risk having a short downtime of your app during the upgrade - it’s better to configure it as soon as you can.

Since I am new to these settings, are there any values some defaults may be, that we can use right away to support shorter downtime?

May be some suggested settings, or some default values would have helped here.

Thank you

I think you’ll like this doc then.

Let me know if it helps

1 Like

1.24 upgrade is now over :slight_smile:

2 Likes

All non-production clusters have been migrated to 1.25

1 Like

1.25 upgrade is over :slight_smile:

2 Likes

@Pierre_Mavro
Even though we have these two things configured ( i.e setting 2 instances and setting readiness & liveness ) already , we faced issue during the upgrade to 1.25

“Bad gateway” issue is what we received while trying to open our website. Is there something we need to do in additional to those two things?

ps: we just noticed that the health checks configurations are not there anymore after the upgrades.

@rophilogene Any updates on my question?

1.26 for non production cluster is over

Hi @Abhishek_Kanojia ,

Is your readiness configured on a port or do you have custom readiness with a dedicated endpoint (which I strongly recommend)?

Thanks

@Pierre_Mavro we have it configured on a specific endpoint.

May be it’s related, but we recently fixed an issue where in some specific cases, the rolling restart of nodes during an upgrade could be a little bit too aggressive.

I can’t tell you if it’s related, so I’d like to ensure everything is correctly set up on your side. Can you please confirm you have more than one instance setup?

Cluster upgrade is now over in 1.26 :slight_smile:

yes more than 1 instance setting is also there.
Thankfully this time the upgrade didn’t reset the settings.