Deploy failed, probe failed, unable to connect using qovery shell

Issues information

Your issue
Unable to connect to qovery shell

Describe here your issue
Our NodeJS app suddenly stopped running since yesterday. we have not done any major upgrade. Can any one assist to resolve this ASAP. we have an investor meeting and are blocked ATM.

Dockerfile content (if any)

specify here your dockerfile content

Hi,

Following our discussion in private, you’re having a pod (application) in a bad shape. You’re having messages like:

Readiness probe failed: dial tcp 100.64.2.230:80: connect: connection refused
Liveness probe failed: dial tcp 100.64.2.230:80: connect: connection refused

This happens because your application port does not open. Generally caused by an issue on your application.

To have more info, I advise you to look into your application logs. You also requested to be able to connect directly to the pod, to look at what happens. As the port does not open, Kubernetes decides to shoot the pod and restart it (crash in loop and restart with exponential backoff algorithm). Which is not convenient for you to debug.

From the troubleshoot page on the documentation, there is a way to disable Kubernetes to shoot the pod:

If you need to manually debug, you can connect to your container:

  1. Temporary delete the application port from your application configuration and redeploy your application
  2. Use qovery shell command to connect to your container and understand what’s wrong
  3. Re-apply the port to listen and redeploy your application

Temporary disabling the port (from the Qovery console) should help you to debug it freely.

I tried logging in using qovery shell to debug the application and I get this Continue with shell command using this context ?
Please type “yes” to validate context: yes

Cannot launch shell UpgradeConnection(ProtocolSwitch(500))ERRO[0007] connection closed by server: websocket: close 1011 (internal server error): EOF from upstream for stdout
bash-3.2$

I currently see this errors in my application logs. However, when I run locally it working perfectly.

13 Jul, 11:21:33.791 app-z79df0814-697c4d9b79-p2wqr 34eb89 {“log”:“npm ERR! A complete log of this run can be found in:\n”,“stream”:“stderr”,“time”:“2022-07-13T01:21:33.588531006Z”}
13 Jul, 11:21:33.791 app-z79df0814-697c4d9b79-p2wqr 34eb89 {“log”:“npm ERR! /root/.npm/_logs/2022-07-13T01_21_33_582Z-debug.log\n”,“stream”:“stderr”,“time”:“2022-07-13T01:21:33.588536386Z”}
13 Jul, 11:22:00.066 app-z79df0814-7584d59ccd-l4z6m bf5789 {“log”:“npm ERR! path /app\n”,“stream”:“stderr”,“time”:“2022-07-13T01:21:59.948495646Z”}
13 Jul, 11:22:00.066 app-z79df0814-7584d59ccd-l4z6m bf5789 {“log”:“npm ERR! command failed\n”,“stream”:“stderr”,“time”:“2022-07-13T01:21:59.958525367Z”}
13 Jul, 11:22:00.067 app-z79df0814-7584d59ccd-l4z6m bf5789 {“log”:“npm ERR! signal SIGTERM\n”,“stream”:“stderr”,“time”:“2022-07-13T01:21:59.960525127Z”}
13 Jul, 11:22:00.067 app-z79df0814-7584d59ccd-l4z6m bf5789 {“log”:“npm ERR! command sh -c /tmp/startdev657675266743.sh\n”,“stream”:“stderr”,“time”:“2022-07-13T01:21:59.960980494Z”}
13 Jul, 11:22:00.067 app-z79df0814-7584d59ccd-l4z6m bf5789 {“log”:“\n”,“stream”:“stderr”,“time”:“2022-07-13T01:21:59.992516577Z”}
13 Jul, 11:22:00.067 app-z79df0814-7584d59ccd-l4z6m bf5789 {“log”:“npm ERR! A complete log of this run can be found in:\n”,“stream”:“stderr”,“time”:“2022-07-13T01:21:59.992624659Z”}
13 Jul, 11:22:00.067 app-z79df0814-7584d59ccd-l4z6m bf5789 {“log”:“npm ERR! /root/.npm/_logs/2022-07-13T01_21_06_627Z-debug-0.log\n”,“stream”:“stderr”,“time”:“2022-07-13T01:21:59.99267733Z”}
13 Jul, 11:22:01.833 app-z79df0814-7584d59ccd-l4z6m bf5789 {“log”:“\n”,“stream”:“stdout”,“time”:“2022-07-13T01:22:01.675919384Z”}
13 Jul, 11:22:01.833 app-z79df0814-7584d59ccd-l4z6m bf5789 {“log”:“\u003e nft-market-api@0.0.0 start:dev\n”,“stream”:“stdout”,“time”:“2022-07-13T01:22:01.675954615Z”}
13 Jul, 11:22:01.833 app-z79df0814-7584d59ccd-l4z6m bf5789 {“log”:“\u003e NODE_ENV=dev ts-node -r tsconfig-paths/register ./src\n”,“stream”:“stdout”,“time”:“2022-07-13T01:22:01.675988525Z”}
13 Jul, 11:22:01.833 app-z79df0814-7584d59ccd-l4z6m bf5789 {“log”:“\n”,“stream”:“stdout”,“time”:“2022-07-13T01:22:01.675991705Z”}

Did you check into your logs file?

/root/.npm/_logs/2022-07-13T01_21_33_582Z-debug.log

Note: if you can change logs destination from file to stdout, it would help you as well

After few seconds the terminal is being disconnected. And later when I connect I get the below error

Please type "yes" to validate context: yes

Cannot launch shell UpgradeConnection(ProtocolSwitch(500))ERRO[0010] connection closed by server: websocket: close 1011 (internal server error): EOF from upstream for stdout 

Any reference how I can redirect the logs to stdout instead to a file.

You’re being disconnected because your app stopped/crashed/did not open the port.

So here are my suggestions:

  1. Did you try what I suggested above? Temporary delete the application port from your application configuration and redeploy your application? So you should be able to connect with Qovery shell without being disconnected
  2. If yes and it’s still an issue, you can try updating your Dockerfile and run a sleep or tail command instead of your application. So you’ll be able to connect through Qovery shell and run all commands you need manually (but 1. should be done as well or you’ll be disconnected as well, this is what happens
  3. If you don’t want 1. and 2., you have to look at how your application is configured (looks to be based on nodejs) and update its config to print debug logs to stdout.
  4. If none of those solution suits you, I advise you to give a try with an APM that will get all info you need with fewer effort as possible (like Datadog or NewRelic)

Hello Pierre,

I was able to get into the pod shell successfully and was able to see the application running successfully. And also, Just to make sure I deployed 2 weeks old code commit which used to perfectly work and still no luck. suddenly everything changed. Did any upgrades happen in the last 3 days.

I can see that the port 8080 is already in use

/app # 
/app # npm run start:dev

> nft-market-api@0.0.0 start:dev
> NODE_ENV=dev ts-node -r tsconfig-paths/register ./src

runing on env:  dev
setting db password
sk_live_319178495CC32064
Error: listen EADDRINUSE: address already in use :::8080
    at Server.setupListenHandle [as _listen2] (node:net:1334:1

Then why is the pod crashing is a million dollar question for me ATM.

Ok great, you were able to get in. What about debug logs when you run it manually? Nothing relevant?

Hello Pierre, Everything looked ok. Even the prod setup which was initially working we stopped it for a while and when I start it its the same issue. At the same time when I logged into the Shell I see the nodejs app already running without issues.

Please find the npm logs:

30 timing npm:load:timers Completed in 0ms
31 timing npm:load:configScope Completed in 0ms
32 timing npm:load Completed in 26ms
33 silly logfile done cleaning log files
34 timing command:run Completed in 8186ms
35 verbose exit 0
36 timing npm Completed in 8222ms
37 info ok

Ok, so you’re still encountering issues right?

If yes:

  1. With the shell, if you see your app working as expected, then the configured port in Qovery may be the wrong one?
  2. If the configured port is the good one, then do you know how much time it takes for your app to open the port? If it’s more than 30s, you can adjust the initial check delay with the liveness_probe.initial_delay_seconds parameter: Advanced Settings | Docs | Qovery

Hello Pierre,
Its a small node js application and hardly takes 10 seconds to boot. This settings are there since long and we have not changed any of it and suddenly the app stopped working.
Please find the docker file we are using

RUN mkdir -p /app
COPY . /app
WORKDIR /app
RUN npm install
RUN npm run build
EXPOSE 8080
CMD ["npm", "run", "start:dev"]

Hi,

I managed to make the last version of your app work by delaying the initial delay check as I mentioned to you above.

It was initially to 30s, I set it to 120s:

{
    ....
    "liveness_probe.initial_delay_seconds": 120,
    ...
}

120s is a little bit high but it works (you can lower it if you want). It looks to be long in the database initialization phase.

Its a small node js application and hardly takes 10 seconds to boot.

Correct me if I’m wrong, but I think you’re biased because the number you give is based on your own computer resources. However, you defined only half of a core for this application. Nowadays laptops have ~8 cores so that’s why it starts quickly from your POV.

Another solution is to grow the number of cores allocated to your app. However, if your app doesn’t need more resources to work efficiently, I advise you to not change the current value then, it will be a waste of resources. Keep on leveraging the liveness_probe.initial_delay_seconds and it will be enough.

Thanks for reporting your issue @admin-solis , I’ve updated the Troubleshot page accordingly :slight_smile:

Please let me know if something is not clear.

Hi !

Same problem here, with a sidekiq app. It seems to happen in the environments I cloned from a cluster to another (initial : Static IP, target : No Static IP)

Is this relevant ? I have no problem when I clone in the same cluster.

Thanks !

cc @Pierre_Mavro

Hello @polive106,

Do you have the same issue concerning the qovery shell or also on deployment ?
Can you send the link of your application(s) impacted ?

Thanks,
Melvin

@Melvin_Zottola, no apparent problem on deployment. Just sent you the link to the app.

Hello @polive106 ,

One qovery component in your cluster was not properly updated, the issue has been fixed. You should be able to use the qovery shell.