Nginx Ingress Controller as DaemonSet

Hello Qovery team,

we are facing some traffic routing issues when the node is marked as unhealthy by the Load Balancer health check, and it is impacting our application.

Only nodes that have an nginx-ingress-controller instance are marked a healthy and receive traffic directly, and in order to make it healthy, we would need to setup one nginx ingress controller per node, i.e. DaemonSet.

We could also set the externalTrafficPolicy to Cluster in order to mark all nodes as healthy, but it does not work in our case, because when externalTrafficPolicy is set to Cluster, we lose the visitor real IP. We have tested it extensively and it did not work.

Let me know if you need more details.

Thanks,
Rafael Ramos

Hi @rophilogene and Qovery team, we are wondering if there are any updates on the thread above. Thanks!

Hi @bpowell_bse and @rafael-blueskyelearn :wave:,

Sorry for the delay on that topic. Could you please share the qovery web console url of the app where you got that issue?

Could you share more details here? How did you see that you had traffic routing issues? Do you have some metrics to share? (From a monitoring solution or anything else) :pray:

Someone from our engineering team will take a look asap. Feel free to share as much details as possible to ease the diagnosis.

Hello @bpowell_bse and @rafael-blueskyelearn,

The reason that the externalTrafficPolicy=Cluster doesn’t work is because the load balancer doesn’t have the proxy-protocol option enabled.

As of now, we don’t support this option as our engine relies on the legacy “in-tree” feature of Kube to provide the LBs.
We are currently in the process to integrate the AWS Load Balancer Controller to benefit of the latest feature on LB provisioning (e.g the proxy-protocol feature)

We’ll update this thread to communicate on the progresson on the integration.

Have a nice day,
Melvin

We were having traffic issues in the /showroom route.

The current ingress path into the prod cluster is LB → Node w/ Ingress Controller → Web pod (or any deployments w/ a domain/ingress). In the default configuration, the only nodes that will accept traffic are the nodes that actually host the ingress controllers (on a dynamically assigned port). Consequently, the only nodes marked as healthy on the load balancer are the nodes w/. the controllers. This combined with the fact that the controllers aren’t aware of the state of the LB health check means that if all of the ingress controller pods are moved with in a small enough window (like during times of frequently scaling up & down), the LB will have all nodes marked as unhealthy and be unable to send traffic (another failure scenario I saw was some amount of time where the traffic was being routed to the node before it had been marked unhealthy, but in all cases its failed or hung requests).