Helm Chart Deployment Failure for Apache Airflow

INFORMATION

Console link: https://console.qovery.com/organization/0f2a7baf-84e9-4b4e-a219-72fb44811f99/project/1e2ddd37-24f7-4e38-aaf1-250a5987410a/environment/f5dde7ed-fa63-4477-96ae-76a9cca712a4/services/general

ISSUE

I am trying to deploy Airflow with helm charts. I am using the latest realeased chart version but my pods are not coming up even after waiting for 10 minutes (the timeout time) and the deployment is failed.

HOW TO REPRODUCE

  1. First I have created a database in Qovery and provided a alias name as DATABASE_URL in env.
  2. I have given an alias name to “QOVERY_HELM_ZD6E26B93_HOST_EXTERNAL” as “BASE_DOMAIN”.
  3. I have then created a helm (New Service → Create Helm) provided the values as
  4. Then provided a RAW YAML with values as
postgres:
  enabled: false

airflow:
  databaseUrl: qovery.env.DATABASE_URL
  # -- domain as shown in browser, this variable and `baseProtocol` are used as part of the BASE_URL environment variable in app and worker container and in the ingress resource, if enabled
  baseDomain: qovery.env.BASE_DOMAIN
  # -- protocol as shown in browser, change to https etc based on your endpoint/ingress configuration, this variable and `baseDomain` are used as part of the BASE_URL environment variable in app and worker container
  baseProtocol: http

  # extra worker groups
  workerGroups:
    # workers configuration
    # The default worker group
    - name: "default"
      replicas: 3
      # -- Annotations to apply to the pods
      annotations:
        qovery.annotations.service

      # -- Labels to apply to the pods
      labels: 
        qovery.labels.service

    - name: "native"
      replicas: 4
      # -- Annotations to apply to the pods
      annotations:
        qovery.annotations.service

      # -- Labels to apply to the pods
      labels: 
        qovery.labels.service

  # app configuration
  app:
    # -- Annotations to apply to the pods
    annotations:
      qovery.annotations.service

    # -- Labels to apply to the pods
    labels: 
      qovery.labels.service
      
  # multiplayer configuration
  multiplayer:
    # -- Annotations to apply to the pods
    annotations:
      qovery.annotations.service

    # -- Labels to apply to the pods
    labels: 
      qovery.labels.service

ingress:
  enabled: false

serviceAccount:
  # Specifies whether a ServiceAccount should be created
  create: true
  # The name of the ServiceAccount to use.
  # If not set and create is true, a name is generated using the fullname template
  name: ""
  annotations:
    qovery.annotations.service
  1. And saved this value. Then on the next page I was able to view the “See default values.yaml” successfully.
  2. Then Created the Helm Chart
  3. After that deployed the chart and received the following logs.
    This is the log file for Airflow Deployment with Helm Charts · GitHub

Hello @Sayantan_Bose,

I’m taking a look

I think the property you set for the database url doesn’t seem to exist in the helm chart values, charts/charts/airflow/values.yaml at main · airflow-helm/charts · GitHub, are you sure this one is correct ?

airflow:
  databaseUrl: qovery.env.DATABASE_URL

From the documentation, each parameter needs to be set up in the chart: Production Guide — helm-chart Documentation

You can try to create each parameter individually in the environment variables and inject them in the values file

HI @Melvin_Zottola

I am from the same team as @Sayantan_Bose and I have been trying the same thing in a separate cluster.

I am doing this in a cluster that already has a running instance of airflow (deployed as an app from container registry). This environment has all the necessary env variables available. So when I try to install airflow I am still encountering this issue.

  1. We are using the official airflow/airflowrepo for helm chart. The link you mentioned is from “User-Community Airflow Helm Chart” which we are not following. But we got your idea in the solution. The official values.yaml is located at airflow/chart/values.yaml at main · apache/airflow · GitHub and that’s what loads at Qovery UI when we override the values in Qovery UI

  1. My overrides is as below
executor: LocalExecutor

postgresql:
    enabled: false 

redis:
    enabled: false 

useStandardNaming: true

statsd:
  enabled: false

images:
  useDefaultImageForMigration: true


scheduler:
  resources:
   limits:
    cpu: 100m
    memory: 128Mi
   requests:
    cpu: 100m
    memory: 128Mi

pgbouncer:
  enabled: true

fernetKey: qovery.env.AIRFLOW__CORE__FERNET_KEY

webserverSecretKey: qovery.env.WEBSERVER_SECRET_KEY

env:
  - name: AIRFLOW__LOGGING__LOGGING_LEVEL
    value: DEBUG

I have not explicitly specified any env variables here for db as they are already set in Qovery UI.

When I try to deploy the chart, I encounter the same issue

The identifiers are

Cluster ID: e927b9f8-c3d4-42f0-97e2-503f68f448b7
Organization ID: 0f2a7baf-84e9-4b4e-a219-72fb44811f99
Project ID: 1e2ddd37-24f7-4e38-aaf1-250a5987410a
Environment ID: f4e9e073-8bac-4015-af45-a5bbc5ac64ec
Service ID: a3799521-de20-4150-8af3-a71072953b2b

The docs (Production Guide — helm-chart Documentation) mention about setting the data property, so I tried with that

  1. I created new env vars to support the individual param style that docs says

  1. And then I updated the values.yaml to include them

However, running into this, I still get the same thing.

Is there a way I can know more on what’s going wrong? When I try this from my terminal by connecting using kubectl and helm it works great.

So, I tried to watch for events using kubectl get events --watch while deploying to see the error and here’s what I found

So it seems like the wait-for-airflow-migrations container is failing. Now I am not a kubectl or helm expert, but I do have some experience with Airflow. I know that Airflow needs to have the db migrations up and ready before it can work. It seems like somehow the migration related jobs are unable to run.

I have the below over ride which should use the default containers for the wait-for-airflow-migrations and the run-airflow-migrations. However I don’t see anything in events related to the run-airflow-migrations container. Nor I am able to find any ref to this in the default values.yaml file.

Waiting for some more time, I see the logs below (even after Qovery has failed and completed the deployment). Seems like the containers are being restarted and going through the same fate

I am happy to open a bug/ discussion on apache/airflow github if you believe that’s the best way to go about it. @Melvin_Zottola

Hello @0xbitmonk,

Thanks for the details, I may have looked at the wrong helm chart.

Following your new environment, there was at first an issue on the ssl mode specified: invalid sslmode value: "enable"
I changed the parameter to allow allowing the container wait-for-airflow-migrations to start successfully.

When the helm is installing, it waits for some migrations to be applied:


But it stops until the timeout specified in the args is reached:

Do you know how much time you need to apply those migrations ?
According to what I see, the helm chart is well configured and the connection to the database seems successful (otherwise it won’t start the migration process from what I understand)

You can override the default timeout using:

images:
  airflow:
    # timeout (in seconds) for airflow-migrations to complete
    migrationsWaitTimeout: 

Also @0xbitmonk, you can use K9s to look at what’s happening during deployments.
You can select any pod / container in a namespace to look into the logs (it’s using the same credentials than for using kubectl: How to connect to your EKS cluster with kubectl | Qovery)

1 Like

Hi @Melvin_Zottola Thanks a lot. With k9s I am able to dig deeper. As I suspected, it seems like the run-airflow-migrations is not started. When I go inside the webserver pod, I only see a webserver & a job for wait-for-airflow-migrations.

Could you point me why would this happen? The default values.yaml

Note: The migrations should complete in a few mins (worse case scenario) but in this case I am using a db that had already an airflow running, so the migrations should be even faster. I also increased the migrationsWaitTimeout to 300s and still the same issue,

Okay, so I found the issue. While my research I stumbled upon Run-airflow-migration and wait-for-airflow-migrations - #3 by paola - The Apache Airflow Forum by Astronomer which talks about issues when deploying with terraform. Then it struck me that Qovery also uses terraform somewhere. So I followed more and landed on this open issue
Helm post-install hooks are not triggered when used by `helm_release` and `wait = true` · Issue #683 · hashicorp/terraform-provider-helm · GitHub So my guess was that if terraform is used here then it could be an issue. So i updated the values.yaml to go with terraform deployment, and it worked. Helm Chart for Apache Airflow — helm-chart Documentation

My updated values.yaml are


fernetKey: qovery.env.AIRFLOW__CORE__FERNET_KEY

webserverSecretKey: qovery.env.WEBSERVER_SECRET_KEY

env:
  - name: AIRFLOW__LOGGING__LOGGING_LEVEL
    value: DEBUG

createUserJob:
  useHelmHooks: false
  applyCustomEnv: false
migrateDatabaseJob:
  useHelmHooks: false
  applyCustomEnv: false

Hope this helps to someone else :slight_smile:

Hi @Melvin_Zottola I have few last questions for you.

Based on my earlier comment, I have been able to get airflow up & running. All the database migrations happen good as well.

1. “Application is not running” on UI

On Qovery UI, it shows that “Application is not running”. However, I can see in k9s that it’s up and running and also port forward & login to airflow instance.

2. Opening access to webserver port does not work

In this case, we only need to expose the webserver (pod name: helm-z97c7e1f9-aflow-airflow-webserver-6cb88dbcd-9fsh5 in the above k9s screenshot). So I have made the config updates as

However, when I visit the link https://webserverui-p8080-zf4e9e073-z95561dca-gtw.ze927b9f8.helpers.sh I get 503

What am I doing wrong here? I think there is some error with the “service name”.

Hi @Melvin_Zottola bumping this up to follow on the questions above

Hi, Bumping this thread up. @Melvin_Zottola . Why would my helm not show the pods or the external URL for connection?

Hi @0xbitmonk , completely unrelated but since you try deploying Airflow I just wanted to share my humble feedback - we were using Airflow at Qovery and we switched to Windmill. It’s an open-source alternative written in Rust, low-resources consumptions and much better than Airflow in my opinion. If you give it a shot I think you’ll like it.

You can even look at this thread where I explained how to install it (it’s easier than Airflow)

On a side note: someone will respond to you - sorry for the delay.

1 Like

Yes, we used this thread to help us with Airflow installation :slight_smile: We have tried Windmill and would eventually migrate to it, but for now, we have a bunch of systems running and don’t want to completely migrate. However, I am now more curious how did your migration looked like? We have a huge Python library written that powers the DAGs.

1 Like

The migration was not too bad - actually it simplifies a lot the code since Windmill don’t ask you to start using stuff like DAG.(..) functions. It’s really smart and you write basic Python code and will execute it with inputs and outputs… All the magic is handled by Windmill.

Here is an example of data inserted into bigquery from one of our steps:

import os
import wmill
import json
from google.cloud import bigquery
from datetime import datetime, timezone, timedelta, date

#requirements:
#dependency
#wmill==1.133.0
#google-cloud-bigquery==3.12.0

def load_bq_credentials():
    creds_content = wmill.get_resource("f/Product/BQ")
    file_name = '/tmp/bqcreds.json'
    with open(file_name, 'w') as f:
        f.write(json.dumps(creds_content))

    os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = file_name

def main(data):
    load_bq_credentials()
    client = bigquery.Client()

    table_ref = client.dataset("our_dataset").table("our_table")

    rows_to_insert = []
    for d in data:
        if not d:
            continue
        # cf schema qovery_windmill.product_usage table
        rows_to_insert.append({
            "created_at_day": d["day"],
            "created_at_timestamp": datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
            "organization_id": d["organization_id"],
            "payload_json": json.dumps(d)
        })

    errors = client.insert_rows_json(table_ref, rows_to_insert)
    if errors != []:
        print("an error occurred while {}".format(errors))
        os.exit(1)

    return data

So you can see it’s pure vanilla Python (minus the wmill lib that I use to load creds from windmill secrets manager).

Hello,

You need to set the qovery labels and annotations in the deployment/statefulset.
Qovery can detect the pods only if those annotations/labels are present.

So you need to edit you values.yaml or chart in order to set thoses.

In the incoming weeks, we are going to release an automatic way to do it, but in the mean time, you need to set by hand those macro in the values.yaml

1 Like

Yes, this solved my issue. Thanks a lot.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.