Describe here your issue
Our NodeJS app suddenly stopped running since yesterday. we have not done any major upgrade. Can any one assist to resolve this ASAP. we have an investor meeting and are blocked ATM.
This happens because your application port does not open. Generally caused by an issue on your application.
To have more info, I advise you to look into your application logs. You also requested to be able to connect directly to the pod, to look at what happens. As the port does not open, Kubernetes decides to shoot the pod and restart it (crash in loop and restart with exponential backoff algorithm). Which is not convenient for you to debug.
I tried logging in using qovery shell to debug the application and I get this Continue with shell command using this context ?
Please type “yes” to validate context: yes
Cannot launch shell UpgradeConnection(ProtocolSwitch(500))ERRO[0007] connection closed by server: websocket: close 1011 (internal server error): EOF from upstream for stdout
bash-3.2$
After few seconds the terminal is being disconnected. And later when I connect I get the below error
Please type "yes" to validate context: yes
Cannot launch shell UpgradeConnection(ProtocolSwitch(500))ERRO[0010] connection closed by server: websocket: close 1011 (internal server error): EOF from upstream for stdout
Any reference how I can redirect the logs to stdout instead to a file.
If yes and it’s still an issue, you can try updating your Dockerfile and run a sleep or tail command instead of your application. So you’ll be able to connect through Qovery shell and run all commands you need manually (but 1. should be done as well or you’ll be disconnected as well, this is what happens
If you don’t want 1. and 2., you have to look at how your application is configured (looks to be based on nodejs) and update its config to print debug logs to stdout.
If none of those solution suits you, I advise you to give a try with an APM that will get all info you need with fewer effort as possible (like Datadog or NewRelic)
I was able to get into the pod shell successfully and was able to see the application running successfully. And also, Just to make sure I deployed 2 weeks old code commit which used to perfectly work and still no luck. suddenly everything changed. Did any upgrades happen in the last 3 days.
I can see that the port 8080 is already in use
/app #
/app # npm run start:dev
> nft-market-api@0.0.0 start:dev
> NODE_ENV=dev ts-node -r tsconfig-paths/register ./src
runing on env: dev
setting db password
sk_live_319178495CC32064
Error: listen EADDRINUSE: address already in use :::8080
at Server.setupListenHandle [as _listen2] (node:net:1334:1
Then why is the pod crashing is a million dollar question for me ATM.
Hello Pierre, Everything looked ok. Even the prod setup which was initially working we stopped it for a while and when I start it its the same issue. At the same time when I logged into the Shell I see the nodejs app already running without issues.
Please find the npm logs:
30 timing npm:load:timers Completed in 0ms
31 timing npm:load:configScope Completed in 0ms
32 timing npm:load Completed in 26ms
33 silly logfile done cleaning log files
34 timing command:run Completed in 8186ms
35 verbose exit 0
36 timing npm Completed in 8222ms
37 info ok
With the shell, if you see your app working as expected, then the configured port in Qovery may be the wrong one?
If the configured port is the good one, then do you know how much time it takes for your app to open the port? If it’s more than 30s, you can adjust the initial check delay with the liveness_probe.initial_delay_seconds parameter: Advanced Settings | Docs | Qovery
Hello Pierre,
Its a small node js application and hardly takes 10 seconds to boot. This settings are there since long and we have not changed any of it and suddenly the app stopped working.
Please find the docker file we are using
RUN mkdir -p /app
COPY . /app
WORKDIR /app
RUN npm install
RUN npm run build
EXPOSE 8080
CMD ["npm", "run", "start:dev"]
120s is a little bit high but it works (you can lower it if you want). It looks to be long in the database initialization phase.
Its a small node js application and hardly takes 10 seconds to boot.
Correct me if I’m wrong, but I think you’re biased because the number you give is based on your own computer resources. However, you defined only half of a core for this application. Nowadays laptops have ~8 cores so that’s why it starts quickly from your POV.
Another solution is to grow the number of cores allocated to your app. However, if your app doesn’t need more resources to work efficiently, I advise you to not change the current value then, it will be a waste of resources. Keep on leveraging the liveness_probe.initial_delay_seconds and it will be enough.
Same problem here, with a sidekiq app. It seems to happen in the environments I cloned from a cluster to another (initial : Static IP, target : No Static IP)
Is this relevant ? I have no problem when I clone in the same cluster.