OOM kills on a number of system containers

Kyle_Flavin · March 27, 2024, 11:15pm

We’ve been seeing a number of Qovery managed services being OOM killed lately in our cluster such as the HPA, VPA, among others, which is concerning. Our events from the past couple days:

Process `cainjector` (pid: 34703) triggered an OOM kill on itself. The process had reached 253584 pages in size.

This OOM kill was invoked by a cgroup, containerID: ea47d570c68ab45d6c63a987c68dd02e815bda1cb5b2fdeda0d93e2842257bcf.

Process `cluster-autosca` (pid: 2859099) triggered an OOM kill on itself. The process had reached 78984 pages in size.

This OOM kill was invoked by a cgroup, containerID: 1201bcd3911c43061591e8dc5cc9bda36d10006c6e5f0e9d76d0b53b8a0de555.

Process `updater` (pid: 4031477) triggered an OOM kill on itself. The process had reached 55025 pages in size.

This OOM kill was invoked by a cgroup, containerID: 18171e8e7e17277173d9150fc369a82bba1b52b0006549084e2d8f74c8e0e9c7.

Process `recommender` (pid: 106332) triggered an OOM kill on itself. The process had reached 44689 pages in size.

This OOM kill was invoked by a cgroup, containerID: 9a08c2879617508157789a04d9f7c9181d15a89580a96607b131b9cadc1b31cb.

Do we need to scale these up? Is it safe for us to modify them directly? Or can that be done through the UI? I don’t see any relevant looking settings.

Kyle

bchastanier · April 16, 2024, 8:05am

Hello @Kyle_Flavin,

Just so you know, we are working on improving it, most of those are under VPA umbrella and will scale up if needed, so there shouldn’t have any issue on that front, service might get killed eventually but will come back with more resources.
A bit of downtime on those containers is ok and shouldn’t lead to any further issues once back.

Cheers

Kyle_Flavin · April 19, 2024, 5:02pm

Got it. Thanks @bchastanier

Topic		Replies	Views
OOM error for application Questions and Answers	2	157	January 29, 2024
How does auto scale work? Deployment	2	514	March 25, 2024
Node scaling failing / problems Deployment qovery	2	449	October 19, 2022
How Kubernetes CPU and RAM resources allocation works with Qovery? Deployment aws	18	4692	September 15, 2022
Container Request vs Max vCPU settings Deployment qovery , aws	2	661	March 25, 2024

OOM kills on a number of system containers

Related topics