Failed to install Datadog

Hi,

I was installing Datadog following the Qovery guide here. Helm install was successful but it seems Datadog agent pods are failing to start. I see READY state 2/3 and if I look at one of the pods I see readiness probe is failing:

Readiness probe failed: HTTP probe failed with statuscode: 500

Are the guides up to date? Is the latest Datadog agent and Qovery still compatible?

Hi @prki,

Can you please give more details? What your cluster console link?
Also, what are the logs on datadog failling pods?

Cheers

Nevermind. I found the issue: I used config with “datadoghq.eu” domain from Qovery example but our account was supposed to use “datadoghq.com”.

1 Like

Hi @bchastanier,

We noticed that one of our cluster’s nodes is having issues running the Datadog agent. We get this error in the logs:

2023-02-09 09:54:49 UTC | CORE | ERROR | (pkg/collector/worker/check_logger.go:69 in Error) | check:datadog_cluster_agent | Error running check: [{“message”: “HTTPConnectionPool(host=‘10.0.7.185’, port=5000): Max retries exceeded with url: /metrics (Caused by NewConnectionError(‘<urllib3.connection.HTTPConnection object at 0x7f77e2ae4a60>: Failed to establish a new connection: [Errno 111] Connection refused’))”, “traceback”: "Traceback (most recent call last):

Have you encountered this issue before?

Cluster identifier: 5b27aa77-b9d3-4d5e-8040-50c8eff62666

Hi @prki ,

Actually I am not aware of such issues. How did you installed DD? Did you tried to re-install it?

Cheers

We installed using Helm following the Qovery guide. Yes, the issue persists after pods get recreated with help upgrade.

We tried re-installing with no luck. I contacted DD support, and they tracked down that Qovery installs its own cluster-agent in “qovery” namespace, and our Datadog agents from “datadog” namespace are trying to connect to the wrong cluster-agent with an incompatible configuration. I’m waiting for their response for suggestions on how to force our Datadog agents to use the correct cluster-agent.

We got to the bottom of this with Datadog support: our Datadog installation discovers Datadog cluster-agent in qovery namespace and tries to connect to it. However, Qovery installed Datadog has a different configuration and doesn’t work here. The solution provided by Datadog support was to disable automatic discovery and add annotations to set the correct configuration explicitly. It feels like this should be supported by Qovery, and helm install should be enough for the correct Datadog installation.

@bchastanier Did you try to reproduce this? It seems this should affect all Qovery clusters.