Nginx-ingress random error : recv() failed (104: Connection reset by peer)

Hi @bchastanier,

Problem solved ! :tada:

You were right since the beginning, it was a timeout problem, but I wasn’t able to find it immediately. The error was caused by 2 timeouts that didn’t work well together : Node.js default keep alive timeout (5s) + proxy timeout (60s).

When you pointed me in this direction, I’ve been searching about timeout problems in Node and Nest but I didn’t find anything related to my problem.
The deep analysis and a better understanding of the problem and the system probably helped me find the resources that match my problem.

I finally found someone who had exactly the same problem, but on AWS with Node + AWS load balancer. The article was written in 2019 but the solution is still relevant.
After trying his best to monitor the Node application and not finding a clue, he did a packet analysis and found the RST packets from Node, as I did.
Then he managed to find that the Node keepalivetimeout was responsible for the connection reset :

After investigating Express, it becomes apparent that Express isn’t really handling much on the socket-layer, so it must be the underlying native Node http.Server that Express uses. And sure enough, in the docs (new with NodeJS 8.0+), is a ‘keepAliveTimeout’, which will forcefully destroy a socket after having a TCP connection sit idle for a default 5 seconds.

Thanks to this article and a few others, I’ve been able to configure the NestJS application correctly by adding 3 lines of code in the bootstrap function to configure the timeouts : based on proxy 60s timeout, I set keepAliveTimeout to 61s and timeoutHeaders to 62s, as advised in one of the articles listed below.

If I understand correctly, another solution could have been to reduce the proxy timeout under the default Node timeout, so 4s. But after checking metrics, these settings suit our needs and the error recv() failed has disappeared since these new settings are set. What a relief !

Links to the articles that helped me solve the problem :

I wish this solution can help other people if they encounter a related issue, to avoid searching hopelessly in so many wrong directions as I did.

The good thing: I’ve learnt a lot, especially about the network in k8s ! So much time spent, but no time wasted. :sweat_smile:

Thank you again for your help !

3 Likes