Deploy failed, probe failed, unable to connect using qovery shell

admin-solis · July 12, 2022, 10:59am

Issues information

OS:
databases:
Programming language and version:
Link to your project on Github/Gitlab: https://github.com/solis-labs/nft-market-api/tree/develop

Your issue
Unable to connect to qovery shell

Describe here your issue
Our NodeJS app suddenly stopped running since yesterday. we have not done any major upgrade. Can any one assist to resolve this ASAP. we have an investor meeting and are blocked ATM.

Dockerfile content (if any)

specify here your dockerfile content

Pierre_Mavro · July 12, 2022, 8:55pm

Hi,

Following our discussion in private, you’re having a pod (application) in a bad shape. You’re having messages like:

Readiness probe failed: dial tcp 100.64.2.230:80: connect: connection refused
Liveness probe failed: dial tcp 100.64.2.230:80: connect: connection refused

This happens because your application port does not open. Generally caused by an issue on your application.

To have more info, I advise you to look into your application logs. You also requested to be able to connect directly to the pod, to look at what happens. As the port does not open, Kubernetes decides to shoot the pod and restart it (crash in loop and restart with exponential backoff algorithm). Which is not convenient for you to debug.

From the troubleshoot page on the documentation, there is a way to disable Kubernetes to shoot the pod:

If you need to manually debug, you can connect to your container:

Temporary delete the application port from your application configuration and redeploy your application

Use qovery shell command to connect to your container and understand what’s wrong

Re-apply the port to listen and redeploy your application

Temporary disabling the port (from the Qovery console) should help you to debug it freely.

admin-solis · July 13, 2022, 1:24am

I tried logging in using qovery shell to debug the application and I get this Continue with shell command using this context ?
Please type “yes” to validate context: yes

Cannot launch shell UpgradeConnection(ProtocolSwitch(500))ERRO[0007] connection closed by server: websocket: close 1011 (internal server error): EOF from upstream for stdout
bash-3.2$

admin-solis · July 13, 2022, 1:28am

I currently see this errors in my application logs. However, when I run locally it working perfectly.

13 Jul, 11:21:33.791	app-z79df0814-697c4d9b79-p2wqr	34eb89	{“log”:“npm ERR! A complete log of this run can be found in:\n”,“stream”:“stderr”,“time”:“2022-07-13T01:21:33.588531006Z”}
13 Jul, 11:21:33.791	app-z79df0814-697c4d9b79-p2wqr	34eb89	{“log”:“npm ERR! /root/.npm/_logs/2022-07-13T01_21_33_582Z-debug.log\n”,“stream”:“stderr”,“time”:“2022-07-13T01:21:33.588536386Z”}
13 Jul, 11:22:00.066	app-z79df0814-7584d59ccd-l4z6m	bf5789	{“log”:“npm ERR! path /app\n”,“stream”:“stderr”,“time”:“2022-07-13T01:21:59.948495646Z”}
13 Jul, 11:22:00.066	app-z79df0814-7584d59ccd-l4z6m	bf5789	{“log”:“npm ERR! command failed\n”,“stream”:“stderr”,“time”:“2022-07-13T01:21:59.958525367Z”}
13 Jul, 11:22:00.067	app-z79df0814-7584d59ccd-l4z6m	bf5789	{“log”:“npm ERR! signal SIGTERM\n”,“stream”:“stderr”,“time”:“2022-07-13T01:21:59.960525127Z”}
13 Jul, 11:22:00.067	app-z79df0814-7584d59ccd-l4z6m	bf5789	{“log”:“npm ERR! command sh -c /tmp/startdev657675266743.sh\n”,“stream”:“stderr”,“time”:“2022-07-13T01:21:59.960980494Z”}
13 Jul, 11:22:00.067	app-z79df0814-7584d59ccd-l4z6m	bf5789	{“log”:“\n”,“stream”:“stderr”,“time”:“2022-07-13T01:21:59.992516577Z”}
13 Jul, 11:22:00.067	app-z79df0814-7584d59ccd-l4z6m	bf5789	{“log”:“npm ERR! A complete log of this run can be found in:\n”,“stream”:“stderr”,“time”:“2022-07-13T01:21:59.992624659Z”}
13 Jul, 11:22:00.067	app-z79df0814-7584d59ccd-l4z6m	bf5789	{“log”:“npm ERR! /root/.npm/_logs/2022-07-13T01_21_06_627Z-debug-0.log\n”,“stream”:“stderr”,“time”:“2022-07-13T01:21:59.99267733Z”}
13 Jul, 11:22:01.833	app-z79df0814-7584d59ccd-l4z6m	bf5789	{“log”:“\n”,“stream”:“stdout”,“time”:“2022-07-13T01:22:01.675919384Z”}
13 Jul, 11:22:01.833	app-z79df0814-7584d59ccd-l4z6m	bf5789	{“log”:“\u003e nft-market-api@0.0.0 start:dev\n”,“stream”:“stdout”,“time”:“2022-07-13T01:22:01.675954615Z”}
13 Jul, 11:22:01.833	app-z79df0814-7584d59ccd-l4z6m	bf5789	{“log”:“\u003e NODE_ENV=dev ts-node -r tsconfig-paths/register ./src\n”,“stream”:“stdout”,“time”:“2022-07-13T01:22:01.675988525Z”}
13 Jul, 11:22:01.833	app-z79df0814-7584d59ccd-l4z6m	bf5789	{“log”:“\n”,“stream”:“stdout”,“time”:“2022-07-13T01:22:01.675991705Z”}

Pierre_Mavro · July 13, 2022, 8:00am

Did you check into your logs file?

/root/.npm/_logs/2022-07-13T01_21_33_582Z-debug.log

Note: if you can change logs destination from file to stdout, it would help you as well

admin-solis · July 13, 2022, 8:54am

After few seconds the terminal is being disconnected. And later when I connect I get the below error

Please type "yes" to validate context: yes

Cannot launch shell UpgradeConnection(ProtocolSwitch(500))ERRO[0010] connection closed by server: websocket: close 1011 (internal server error): EOF from upstream for stdout

Any reference how I can redirect the logs to stdout instead to a file.

Pierre_Mavro · July 13, 2022, 9:48am

You’re being disconnected because your app stopped/crashed/did not open the port.

So here are my suggestions:

Did you try what I suggested above? Temporary delete the application port from your application configuration and redeploy your application? So you should be able to connect with Qovery shell without being disconnected
If yes and it’s still an issue, you can try updating your Dockerfile and run a sleep or tail command instead of your application. So you’ll be able to connect through Qovery shell and run all commands you need manually (but 1. should be done as well or you’ll be disconnected as well, this is what happens
If you don’t want 1. and 2., you have to look at how your application is configured (looks to be based on nodejs) and update its config to print debug logs to stdout.
If none of those solution suits you, I advise you to give a try with an APM that will get all info you need with fewer effort as possible (like Datadog or NewRelic)

admin-solis · July 13, 2022, 2:08pm

Hello Pierre,

I was able to get into the pod shell successfully and was able to see the application running successfully. And also, Just to make sure I deployed 2 weeks old code commit which used to perfectly work and still no luck. suddenly everything changed. Did any upgrades happen in the last 3 days.

I can see that the port 8080 is already in use

/app # 
/app # npm run start:dev

> nft-market-api@0.0.0 start:dev
> NODE_ENV=dev ts-node -r tsconfig-paths/register ./src

runing on env:  dev
setting db password
sk_live_319178495CC32064
Error: listen EADDRINUSE: address already in use :::8080
    at Server.setupListenHandle [as _listen2] (node:net:1334:1

Then why is the pod crashing is a million dollar question for me ATM.

Pierre_Mavro · July 13, 2022, 4:39pm

Ok great, you were able to get in. What about debug logs when you run it manually? Nothing relevant?

admin-solis · July 13, 2022, 11:14pm

Hello Pierre, Everything looked ok. Even the prod setup which was initially working we stopped it for a while and when I start it its the same issue. At the same time when I logged into the Shell I see the nodejs app already running without issues.

Please find the npm logs:

30 timing npm:load:timers Completed in 0ms
31 timing npm:load:configScope Completed in 0ms
32 timing npm:load Completed in 26ms
33 silly logfile done cleaning log files
34 timing command:run Completed in 8186ms
35 verbose exit 0
36 timing npm Completed in 8222ms
37 info ok

Pierre_Mavro · July 14, 2022, 5:46am

Ok, so you’re still encountering issues right?

If yes:

With the shell, if you see your app working as expected, then the configured port in Qovery may be the wrong one?
If the configured port is the good one, then do you know how much time it takes for your app to open the port? If it’s more than 30s, you can adjust the initial check delay with the liveness_probe.initial_delay_seconds parameter: Advanced Settings | Docs | Qovery

admin-solis · July 14, 2022, 6:01am

Hello Pierre,
Its a small node js application and hardly takes 10 seconds to boot. This settings are there since long and we have not changed any of it and suddenly the app stopped working.
Please find the docker file we are using

RUN mkdir -p /app
COPY . /app
WORKDIR /app
RUN npm install
RUN npm run build
EXPOSE 8080
CMD ["npm", "run", "start:dev"]

Pierre_Mavro · July 14, 2022, 7:49am

Hi,

I managed to make the last version of your app work by delaying the initial delay check as I mentioned to you above.

It was initially to 30s, I set it to 120s:

{
    ....
    "liveness_probe.initial_delay_seconds": 120,
    ...
}

120s is a little bit high but it works (you can lower it if you want). It looks to be long in the database initialization phase.

Its a small node js application and hardly takes 10 seconds to boot.

Correct me if I’m wrong, but I think you’re biased because the number you give is based on your own computer resources. However, you defined only half of a core for this application. Nowadays laptops have ~8 cores so that’s why it starts quickly from your POV.

Another solution is to grow the number of cores allocated to your app. However, if your app doesn’t need more resources to work efficiently, I advise you to not change the current value then, it will be a waste of resources. Keep on leveraging the liveness_probe.initial_delay_seconds and it will be enough.

Pierre_Mavro · July 14, 2022, 9:42am

Thanks for reporting your issue @admin-solis , I’ve updated the Troubleshot page accordingly

Please let me know if something is not clear.

polive106 · August 18, 2022, 9:02am

Hi !

Same problem here, with a sidekiq app. It seems to happen in the environments I cloned from a cluster to another (initial : Static IP, target : No Static IP)

Is this relevant ? I have no problem when I clone in the same cluster.

Thanks !

cc @Pierre_Mavro

Melvin_Zottola · August 18, 2022, 10:43am

Hello @polive106,

Do you have the same issue concerning the qovery shell or also on deployment ?
Can you send the link of your application(s) impacted ?

Thanks,
Melvin

polive106 · August 18, 2022, 11:20am

@Melvin_Zottola, no apparent problem on deployment. Just sent you the link to the app.

Melvin_Zottola · August 18, 2022, 5:24pm

Hello @polive106 ,

One qovery component in your cluster was not properly updated, the issue has been fixed. You should be able to use the qovery shell.

Topic		Replies	Views
Trying Qovery Demo, Logs are empty and deployment stuck in "queueing" status Deployment	10	29	November 26, 2024
Local cluster/qovery installation partially failed Questions and Answers qovery , kubernetes	10	171	July 23, 2024
ShortmeURL - Deployment Failed With Error message: ExitStatusError ExitStatus unix_wait_status 256 Questions and Answers qovery	11	1106	March 25, 2024
Deployment error Deployment	10	1007	March 25, 2024
Qovery simply returns DEPLOYMENT_ERROR with no further info. (MongoDB app) Deployment qovery	2	769	March 25, 2024

Deploy failed, probe failed, unable to connect using qovery shell

Related topics