Having Issue Connecting Application Metrics to Datadog Agent

INFORMATION

Python 3.7
Service Link:
https://console.qovery.com/organization/715a3cc2-79f8-493e-bfe7-7d4aa5c4d2a6/project/9c5249e0-1e7e-4af5-8676-a135602d3698/environment/ee3c669e-4c70-4af1-baae-a8a4ae016e36/application/98376acc-aebd-490e-8caf-41cec0e47b79/general

ISSUE

Hello,

We are currently running a Django Application as one of our services in Qovery and we are not able to have the datadog python tracer connect to the datadog agent installed on the node. The issue we get is

Exception in thread ddtrace.internal.writer:AgentWriter:
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.7/site-packages/ddtrace/internal/periodic.py", line 59, in run
    self._on_shutdown()
  File "/usr/local/lib/python3.7/site-packages/ddtrace/internal/writer.py", line 444, in periodic
    self.flush_queue(raise_exc=False)
  File "/usr/local/lib/python3.7/site-packages/ddtrace/internal/writer.py", line 420, in flush_queue
    self._send_payload(encoded, len(enc_traces))
  File "/usr/local/lib/python3.7/site-packages/ddtrace/internal/writer.py", line 318, in _send_payload
    response = self._put(payload, headers)
  File "/usr/local/lib/python3.7/site-packages/ddtrace/internal/writer.py", line 290, in _put
    conn = get_connection(self.agent_url, self._timeout)
  File "/usr/local/lib/python3.7/site-packages/ddtrace/internal/agent.py", line 81, in get_connection
    return compat.httplib.HTTPConnection(hostname, parsed.port, timeout=timeout)
  File "/usr/local/lib/python3.7/urllib/parse.py", line 172, in port
    port = int(port, 10)
ValueError: invalid literal for int() with base 10: 'None'

Not exactly sure whats going on here. For our setup with our normal infrastructure (Kubes Non Qovery) we normally have an ENV VAR for DD_AGENT_HOST and that points to the host for our Node in our K8 cluster. I think maybe Qovery may have a different setup but not 100% sure on this one. We could potentially be missing a specific env var that helps the application send traces to the DD Agent on the Kube Node for our Qovery Cluster

HOW TO REPRODUCE

We followed the exact steps on Qovery’s documentation on installing the Datadog agent on the cluster. And once we started sending traces we got the error above.

CC: @rophilogene

Hi @Parth_Patel , thank you for reporting your issue. I’ll ask someone from my engineering team to take a look :pray:

Hello @Parth_Patel,

By looking at your cluster, there doesn’t seem to be a datadog agent running on it. Did you install it recently ? (e.g following this doc: Kubernetes observability and monitoring with Datadog | Qovery)

Hello @Melvin_Zottola ,

We actually removed the Datadog Agent from our cluster when we realized it wasn’t working again. Would you like me to reinstall it for the purpose of this troubleshooting.

Okay @Parth_Patel
You can try to reinstall it, from the error message above I think the port wasn’t available in your application: ValueError: invalid literal for int() with base 10: 'None'
Do you remember if you set an environment variable on Qovery side to indicate the datadog port ?

Ok thanks @Melvin_Zottola ,

I will try this again and try adding the port variable.

Do you know exactly what the env var and value for the env var should be?

Thanks,
Parth

@Melvin_Zottola any updates here :slight_smile:

Hey @Parth_Patel,

Sorry for the delay here.
I am looking into your issue and get back to you ASAP.

Best,

Hello @Parth_Patel,

Sorry for the delay, ramping up on the topic. So if I understood properly, you want to send metrics to DD agent. Am I correct?

Looking at similar setups, I can see 3 variables set in the service to be watched:

  • DD_AGENT_HOST=(v1:spec.nodeName)
  • DD_LOGS_INJECTION=true
  • DD_SERVICE_NAME=[PUT-YOUR-APP-SERVICE-NAME-HERE]

Does it make sense?

Best,

Hey @bchastanier ,

Sorry for the late reply as I was out.

Just to confirm these are the 3 variables we need ontop.

DD_AGENT_HOST should be node that the service is running on?

  • If so how can we dynamically get the node name as that is dynamic based on which node gets the service.

DD_SERVICE_NAME seems simple enough, whatever we named the service just put that as the variable.

Thanks,
Parth

Hello @Parth_Patel ,

Sorry for the delay, I’m taking back the subject.

I’m digging deeply into the docs and going to test on my side the installation & connection with a service to be sure for the good datadog configuration.

I’ll answer asap,

Melvin

Hey @Melvin_Zottola any updates on this one?

Hello @Parth_Patel

Yes I’ve setup a simple app on my side to validate the configuration, the logs are well sent to datadog.

As I’m not familiar with python, I created a simple java based and followed the instruction available in the datadog interface for APM > Introduction (it works in a similar way for python with an agent to download and specify at start):

Then by creating a simple application in Qovery, the logs are available in datadog. Didn’t create any variable on Qovery side at this point. The HOST corresponds well to the node name where the pods of my application are running:

Attempted to specify some variable recommended in the documentation such as DD_ENV, DD_SERVICE and DD_VERSION:


And those information are well transferred to the datadog agent after redeploying the app:

For information, the Dockerfile of the simple app I used is here if you want to look at it: https://github.com/mzottolaqovery/simplejavaapp/blob/main/Dockerfile#L19

From my tests, the configuration specified in the qovery datadog documentation seems sufficient to let your applications send the logs to datadog, the DD_AGENT_HOST doesn’t seem to be mandatory.