Brief post-deployment application downtime (build memory)

We have a fairly large application (15-16GB image) and we’ve noticed recently that when we deploy, our application has about 5 minutes of downtime where it seems the new application image is “building” before it can handle requests and serve traffic.

  1. The application deployment succeeds
  2. The router deployment succeeds
  3. Pods are Service status: Running

But we’ve noticed immediately post-deploy that if we look in the pods console, each pod has to incrementally build up (as in, go from 1% to 100% of the full image memory commitment), and pod CPU is fully maxed while this is happening. This takes a few minutes, then once it completes and we reach the full memory size of the image, CPU normalizes and the application is good to go.

Did anything change in terms of how Qovery handles image caching, build rate or anything like that in the past month or two? We used to deploy the same size image and application and the new application build was online almost instantly (maybe the router took a few seconds to resolve). Not aware of anything with our config, cluster, DNS or anything else that would be impacting this. As a result we’ve had to be more judicious about timing deploys for late nights or weekends when we can tolerate being down a few minutes.

Wondering if there’s an underlying reason and solution for this? Thanks

Hi @ChrisBolman1 ,

I’ll let our product team (cc @a_carrano and @Julien_Dan) appropriately respond to your questions about whether anything has changed on how Qovery handles image caching and anything build-related in the last few days. To confirm, when did you notice a change?

On a side note: I noticed that your prod app does not have the Health Check probes configured, and I wonder if that would be the problem of your issue. Is there a specific reason why they are not yet configured? (It’s highly recommended for production apps for precisely what you mentioned).

HI @ChrisBolman1 ,

nothing has changed in the way we manage image caching or build. As Romaric said, it seems to be mainly an issue with the healthcheck configuration, making the app unstable during the roll-out of the new version since the pods are not yet ready to receive traffic.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.