Container Request vs Max vCPU settings

Hello @rophilogene @Pierre_Mavro
It appears we are getting alerts due to our container peaking very briefly above max limit despite our containers always staying at much lower limit than allocated.(CPUThrottlingHigh (Prometheus Alert) · robusta-dev/alert-explanations Wiki · GitHub)

in looking at our container settings, it seems that Qovery sets the container resources where requested and max CPU setting is the same. Can you help us set the requested CPU value at a much lower level? This will allow us to also set the max allowed at a higher level or if possible remove this limit so we can allow the EC2 to auto-scale if necessary rather than our containers being CPU throttled? Setting requested level at a lower threshold will also save us on resources and will probably help us right-size our EC2 as now we spin up unnecessary EC2 hosts just because our requested CPU value is high but mostly unused.
Thanks!
Screen Shot 2023-03-29 at 4.02.36 PM

Hi @sama213 ,

TLDR; Qovery doesn’t support resource over-commit for reliability reasons

The best practices are not to do resource over-commitment. Obviously, it’s convenient to optimize as much as we can on resource nodes. However, you’ll get unexpected behavior, leading to long investigations, cluster reliability issues, and application stability issues. This is why we don’t provide this out of the box.

Let me give you some clear examples to understand why it’s not a good idea.

CPU over-commitment

With multiple applications with CPU over-commitment, several issues can be encountered:

  • The application autoscaler may not work as expected because it will have difficulty reaching the max limit because other applications on the same node also need CPU time. So the application autoscaler may not behave as expected.
  • Too many CPU-intensive applications on a single node without strict limits, leading to becoming unresponsive with all other applications on the same node.

Behaving like this on a complete cluster can make it complexly unstable for every application. It looks convenient at first but can quickly become a nightmare.

Memory over-commitment

Example:

  • 1 node, 4Gb of Memory
  • 1 high application 3Gb of memory usage: request 2Gb, limit 4Gb
  • 1 low application 512Mb memory usage: request 1Gb, limit 1Gb

Here we have:

  • Memory usage: 3Gb + 512Mb = 3.5Gb (node is 4Gb)
  • Request memory: 2Gb + 1Gb = 3Gb
  • Limit memory: 4Gb + 1Gb

The requested memory is below what the maximum node can handle so that both applications will start on this node.

The limit memory is above what the node can support. So as soon as we’re close to the limit of what the node can handle (4Gb - kernel usage - cloud provider applications - qovery applications - operating system usage), the OOM Killer (Kernel protection against node crash) will run, killing the highest application memory usage.

If your low application usage moves from 512Mb to 950Mb (which is fair because you’ve allowed it), then you’ll use more memory than the node can handle. The OOM Killer will kill the other application (highest memory usage) to reclaim memory to avoid the node crashing.

Conclusion

To conclude, your issue mainly comes from an alert you receive; you don’t observe issues with your application. So instead of trying to tune application parameters that can compromise your cluster reliability and other applications’ stability, I advise you to update the alert threshold/behavior.