Deployment strategy

For my deployment I have two importants requirements:

  • Minimize the downtime
  • Ensure that the same version of each service is live at any given moment (for instance be38b26 for the frontend and be38b26 for the backend)

On AWS Beanstalk, I solved this problem by using the “Immutable deployment strategy”.

How can I do this on Qovery?

Hi @leonard-henriquez ,

Today we only support one deployment method which is the classical rolling update strategy:

Rolling updates allow Deployments’ update to take place with zero downtime by incrementally updating Pods instances with new ones. The new Pods will be scheduled on Nodes with available resources.

If you encounter downtimes anyways, then I advise you to have a look to probes in the advanced settings .

The equivalent of what you’re looking for in the Kubernetes ecosystem is called “blue-green deployment”. However as far as I understand you don’t want it only on a single application level but on the whole environment (please correct me if I’m wrong).

This is something we could definitively consider but will take some time. How blocking is it for you?

Thanks

Hi @Pierre_Mavro ,
Thank you for the details on the deployment method !

(…) as far as I understand you don’t want it only on a single application level but on the whole environment (please correct me if I’m wrong).

Indeed ! I need my backend and my frontend to be synced so that, for example, the frontend doesn’t call for API endpoint that haven’t been deployed yet.

Kubernetes solves my problem by allowing multiples containers in one pod.
But from what I understand, in Qovery each “service” is deployed in a different pod, right? (please correct me if I wrong, I couldn’t find any information on this part in the documentation)

How blocking is it for you?

To be honest, I don’t see how I could run a production application with real users without ensuring my environment is sane (= all my services are in good health AND at the same version).

Let me give you an example. I have two services to deploy (my backend and my frontend). Let’s
say that my frontend deployment finishes first. Then my users won’t be able to use new features (using new API endpoints) until the backend is deployed. That’s even worse if my backend deployment fails because I’m stucked with no features working until I deploy again a working version.

It raises a few questions:

  • How do your other customers handle this problem ?
  • Is there a way to minimize the time between the deployment of my two services?
  • I guess an option would be to plan deployments during the night? (is it what the “deploy on specific timeframe” feature does?)
  • Besides not being optimal, still causing downtime and limiting my capacity to deploy frequently, what happen if something goes wrong ? In the case the deployment of one of the service fails (or succeed but the app health check fails), is there a way to instantly rollback a service ?
  • Is there a way to do it programmatically? (so that I don’t need to monitor actively each deploy)

Sorry for this long message !

Hi @leonard-henriquez, Just throwing my 2cents here hopefully it helps. Here are a few suggestions:

  1. Plan your deployments so that you deploy your backend API changes 1st and version your endpoints If you have significant breaking changes between deployments
  2. Leverage feature flags. You could enable feature flags on the specific pieces of code in your fronted and only enable them when your backend is fully deployed.
2 Likes

Kubernetes solves my problem by allowing multiple containers in one pod.
But from what I understand, in Qovery each “service” is deployed in a different pod, right?

You’re right! This is to avoid misbehavior in production. Suppose you have a container having an issue in a pod (exit, crashing because of too much memory consumption…), anything wrong which will restart a container. In that case, the whole pod is restarting.

Here is an example: let’s say you have a frontend and a backend app in the same container. If your frontend app crash (Nginx, PHP-fpm/whatever is not correctly adapted to the workload, a bug, or anything else), then your backend will restart with the frontend. What happens in this situation? All the current work your backend was on, will be lost. Why? Your backend was fine and maybe was treating important stuff; reloading it can take some time as well! So why should it be restarted in that case? Moreover, you have a high chance it will bring unexpected issues than it was supposed to solve.

In addition, the autoscaling of your pod will be a nightmare to manage as you don’t have the same behavior of the 2 apps and can’t act the same behavior on the same metrics (CPU/ram/anything else). Multiple containers in a single pod should only be used for specific reasons (network mesh topology, automatic TLS endpoints, app metrics…). By experience, it’s a terrible practice to have several containers in a pod, this is why we don’t propose it.

To be honest, I don’t see how I could run a production application with real users without ensuring my environment is sane (= all my services are in good health AND at the same version).

I agree with that, but not with the way to do it. I don’t want to offend you; just bring my point of view based on years of experience. My goal is just to help you as much as I can, not tell you what you have to do.

So, from my experience, this architecture can work well on specific use cases, like simple ones, without much data to manipulate. But it’s not that common. With standard products with a database, you’ll have to manage schema upgrades and won’t be able to handle rollback on a schema easily, right?

For applications, it should be the same. Your applications should be able to support ~2 versions of their dependencies, waiting for the deprecation and code removal. Helping you only to update/release what you need, making the rollback simple because you can update one app at a time and not make a big bang.

As time passing, you will undoubtedly multiply the number of applications you have, and maintaining all of them at the same version at the same time will be incredibly time-consuming and reduce your velocity of delivery. And at some points, rollback will be almost impossible without a massive impact on your users (more products, more changes, more people to sync with each other…).

So the theory is excellent, but the practice is not. This is why you find rolling update as the default strategy on most (maybe all) application/container schedulers.

To conclude, adding blue-green deployment to the product is something we’re considering because, as I mentioned, in some particular cases, it’s interesting. But as it’s not requested by our customers that much, it’s not that common on production, and it’s not on Qovery top priority list, so I don’t have ETA to give you for this :frowning: .

My last advice is I encourage you to switch to a classical rolling update method as you will undoubtedly switch to it in the future.

Let me give you an example. I have two services to deploy (my backend and my frontend). Let’s
say that my frontend deployment finishes first. Then my users won’t be able to use new features (using new API endpoints) until the backend is deployed. That’s even worse if my backend deployment fails because I’m stucked with no features working until I deploy again a working version.

In this case, your backend should be able to handle both version of the frontend. When the service is reloading, customers will be on one frontend or another, but in anycase it will work. You can use sticky sessions as well if you’re afraid of having the switch too fast for a single customer.

It raises a few questions:

  • How do your other customers handle this problem ?

Maintaining 2 versions or as @itajenglish mentioned above: feature flags.

  • Is there a way to minimize the time between the deployment of my two services?

It only depends on your app’s starting time. If you manage 2 versions in parallel, it’s ok. In my previous experiences, I had the same application deployed 5000+ times (instances). You just can’t do the blue-green deployment in this case because you need to double the number of instances just for the update (you can imagine the bill and or bare metal it’s too expensive to be considered). Rolling update and, canary deployments, features tags are helping a lot in that case. This is why I encourage you to consider a rolling update strategy.

  • I guess an option would be to plan deployments during the night? (is it what the “deploy on specific timeframe” feature does?)

Once again, I had an experience in the financial industry for ~6y. You can’t imagine how painful it is to release off hours. Hours constraints are exhausting for everyone. If you can avoid it, go for it. What is commonly made in the industry is to add tests, and validate with a pre-prod or ephemeral environment (thanks to Qovery for making it so easy). So you can safely make updates on working hours without fear.

And because we’re all humans, shit happens, and this is why you can regularly find post-mortems even for big companies like Google, Facebook… and this is also why you’re looking for an easy way to rollback.

I’m not saying you shouldn’t care about issues, but with what I explained above, it will drastically reduce the ones you can encounter.

  • Besides not being optimal, still causing downtime and limiting my capacity to deploy frequently, what happen if something goes wrong ? In the case the deployment of one of the service fails (or succeed but the app health check fails), is there a way to instantly rollback a service ?

With the rolling update strategy, when you run a deployment, Kubernetes uses probes to ensure your app is correctly working. So if you’re correctly using liveness and readiness probes, your application will not be used until probes will tell Kubernetes to redirect traffic on it.

And if you miss-configured the probes, or probes are saying everything is ok, but you have a bug to rollback, yes, you can select the application version to rollback. It’s effortless for Qovery to perform a rollback.

  • Is there a way to do it programmatically? (so that I don’t need to monitor actively each deploy)

Avoiding the monitoring of your app is, I think, one of the biggest mistakes you can make. You don’t want to spend time on observability and monitoring, I can understand it even if disagree, as you have concerns about your application uptime. In this case, I suggest you to look at one or some observability/monitoring/apm tools directly on your app as a library like Datadog, Newrelic, Sentry, Logrocket… It will be effortless to have more than the minimum you should have.

Sorry for the long message too :wink:

1 Like

I do plan to version my endpoints in the future.
Right now, my product is still in beta and I don’t want to introduce this extra complexity.
That’s why I needed an immutable deployment strategy.
But I guess I’m just not the right fit for your product !

Thanks for your answers !