For several weeks now, we’ve been experiencing consistent latency issues with our application, where requests are taking several seconds or resulting in connection refusals.
Our service monitoring is not indicating any anomalies, and some of the requests are not showing up in our logs. The problem seems to be coming from the ingress, but we haven’t been able to pinpoint the exact cause.
Can you please investigate these issues?
To determine if the issue is related to NGINX, could you please provide (during the same period you see latency issues):
- graphs showing the number of pods of NGINX
- The resources (cpu, ram) used by NGINX pods
- The HPA metrics of Nginx
Hello @Pierre_Gerbelot, I’m working with @james075 on this issue.
It appears that NGINX scales up to 10 times the usual number of pods we have and then rapidly scales back down to the standard pod count a few minutes later. These spikes in the pod count graph seem to match with the latency problems we’re experiencing.
The above graph shows the number of pods in the
nginx-ingress namespace during the last 2 days.
Here are the memory and CPU metrics graphs. Running pods seems to have a 500m CPU and 768Mi memory limit.
We think that the CPU resources allocated to the NGINX pod are configured too low. This is why auto-scaling is triggered too often.
If you agree, we will manually adjust this setting to reduce auto-scaling. We will temporarily lock your cluster, and you will not be able to change any configurations.
After a few days of testing, if you encounter no more latency issues, we will deliver a feature that allows you to configure this parameter (thereby unlocking the cluster).
If you still encounter issues, we would need the same kind of metrics, including the number of pods, CPU, and RAM usage of NGINX.
Could you please provide your web console URL and the name of the cluster?
Yes let’s try it ! Thanks @Pierre_Gerbelot
Console url Qovery
It is done. Let us know in few days if you still have the issue.
Hello @james075, @baptiste_piana
Did you encounter any other latency issues after my change to the NGINX CPU resources? You can now configure the NGINX settings through the cluster advanced settings. I’ve changed the nginx.vcpu.request_in_milli_cpu to 500 to reflect the manual changes I made last week.
It has fixed our latency issues.