Huge number of errors log stream on GCP

Hello, randomly we are starting getting a huge amount of error logs generating from a self managed GCP auto-pilot cluster with qovery. Errors are all from qovery namespace and looks like this:

We didn’t update the cluster neither touched nothing, but they started generating since August 18 2024:

This huge amount of logs cost a lot of money to us, so please can anyone provide any support/help/suggestion?

Hello @Nextools ,

Thanks for contacting us with your problem. Your cluster is self-managed, so we don’t have access to it, but I will try to help.

Looking at the log, this looks like a permission problem with Loki.

Can you check if the values you are using with the Loki Helm charts are right? Especially the serviceAccount.annotations.iam.gke.io/gcp-service-account.
It needs to have the role roles/storage.objectAdmin.

For managed clusters, we use this configuration:

  project = project_name
  role    = "roles/storage.objectAdmin"

I hope this answer was helpful,

Regards,
Charles-Edouard

This is really strange because we do not have access to the Loki helm chart, it is in the configuration of the helm chart of qovery. Do I have to do something with the qovery helm chart? It started when I wanted to try to deploy a persisent volume claim on the cluster. Do you have any advise?

Hello @Nextools,
Are you using loki logging?
If not you can disable it in the config file, it is enabled by default but it seems it misses some configuration regarding the storage in your installation.

services:
...
  logging:
    loki:
      enabled: false

See the documentation here: Configuration | Docs | Qovery
and chart values examples: qovery-chart/charts/qovery/values-gcp.yaml at main · Qovery/qovery-chart · GitHub

Regards

From what I see is already disabled, isn’it?

At the moment I disabled log routing on GCP

Thank you.
By way of additional information, can you share us which pods are deployed in the qovery namespace in your cluster please?

Regards.

Sure, here it is:

I see some discrepancy between the config and what is running :thinking:

Then you should try to re apply the charts.
Check the diff before applying them to prevent from any other unexpected changes.

Regards.

Perfect, re-applying the chart removed the other pods, but now I’m not able to see any live logs and also the service is showed as “STOPPED” but it is not. I think that this make sense since we removed loki (that is in charge of logging), so the idea is to have the logs enabled, but what was the original problem is the large amount of logs generated by something that I cannot identify.
Do you have any other suggestions?


Hello @Nextools ,

Sorry for the late answer,

Do you still need help on this topic?

Regards,
Charles-Edouard

Hi, would be great, unfortunately is not solved. The problem started when I deployed a container with a Persistent Volume Storage attached to it. I’m using a self managed Google Autopiliot K8S

Additional Information

Unfortunately after deploying everything from scratch, the same issue appeared.


Hello @Nextools ,

Thank you for taking the time to redeploy your cluster.

I forward your message to the team and I’ll get back to you.

Regards,
Charles-Edouard

This topic was automatically closed after 22 hours. New replies are no longer allowed.