I am trying to deploy Airflow with helm charts. I am using the latest realeased chart version but my pods are not coming up even after waiting for 10 minutes (the timeout time) and the deployment is failed.
HOW TO REPRODUCE
First I have created a database in Qovery and provided a alias name as DATABASE_URL in env.
I have given an alias name to “QOVERY_HELM_ZD6E26B93_HOST_EXTERNAL” as “BASE_DOMAIN”.
I have then created a helm (New Service → Create Helm) provided the values as
postgres:
enabled: false
airflow:
databaseUrl: qovery.env.DATABASE_URL
# -- domain as shown in browser, this variable and `baseProtocol` are used as part of the BASE_URL environment variable in app and worker container and in the ingress resource, if enabled
baseDomain: qovery.env.BASE_DOMAIN
# -- protocol as shown in browser, change to https etc based on your endpoint/ingress configuration, this variable and `baseDomain` are used as part of the BASE_URL environment variable in app and worker container
baseProtocol: http
# extra worker groups
workerGroups:
# workers configuration
# The default worker group
- name: "default"
replicas: 3
# -- Annotations to apply to the pods
annotations:
qovery.annotations.service
# -- Labels to apply to the pods
labels:
qovery.labels.service
- name: "native"
replicas: 4
# -- Annotations to apply to the pods
annotations:
qovery.annotations.service
# -- Labels to apply to the pods
labels:
qovery.labels.service
# app configuration
app:
# -- Annotations to apply to the pods
annotations:
qovery.annotations.service
# -- Labels to apply to the pods
labels:
qovery.labels.service
# multiplayer configuration
multiplayer:
# -- Annotations to apply to the pods
annotations:
qovery.annotations.service
# -- Labels to apply to the pods
labels:
qovery.labels.service
ingress:
enabled: false
serviceAccount:
# Specifies whether a ServiceAccount should be created
create: true
# The name of the ServiceAccount to use.
# If not set and create is true, a name is generated using the fullname template
name: ""
annotations:
qovery.annotations.service
And saved this value. Then on the next page I was able to view the “See default values.yaml” successfully.
I am from the same team as @Sayantan_Bose and I have been trying the same thing in a separate cluster.
I am doing this in a cluster that already has a running instance of airflow (deployed as an app from container registry). This environment has all the necessary env variables available. So when I try to install airflow I am still encountering this issue.
We are using the official airflow/airflowrepo for helm chart. The link you mentioned is from “User-Community Airflow Helm Chart” which we are not following. But we got your idea in the solution. The official values.yaml is located at airflow/chart/values.yaml at main · apache/airflow · GitHub and that’s what loads at Qovery UI when we override the values in Qovery UI
So it seems like the wait-for-airflow-migrations container is failing. Now I am not a kubectl or helm expert, but I do have some experience with Airflow. I know that Airflow needs to have the db migrations up and ready before it can work. It seems like somehow the migration related jobs are unable to run.
I have the below over ride which should use the default containers for the wait-for-airflow-migrations and the run-airflow-migrations. However I don’t see anything in events related to the run-airflow-migrations container. Nor I am able to find any ref to this in the default values.yaml file.
Waiting for some more time, I see the logs below (even after Qovery has failed and completed the deployment). Seems like the containers are being restarted and going through the same fate
Thanks for the details, I may have looked at the wrong helm chart.
Following your new environment, there was at first an issue on the ssl mode specified: invalid sslmode value: "enable"
I changed the parameter to allow allowing the container wait-for-airflow-migrations to start successfully.
When the helm is installing, it waits for some migrations to be applied:
Do you know how much time you need to apply those migrations ?
According to what I see, the helm chart is well configured and the connection to the database seems successful (otherwise it won’t start the migration process from what I understand)
You can override the default timeout using:
images:
airflow:
# timeout (in seconds) for airflow-migrations to complete
migrationsWaitTimeout:
Hi @Melvin_Zottola Thanks a lot. With k9s I am able to dig deeper. As I suspected, it seems like the run-airflow-migrations is not started. When I go inside the webserver pod, I only see a webserver & a job for wait-for-airflow-migrations.
Note: The migrations should complete in a few mins (worse case scenario) but in this case I am using a db that had already an airflow running, so the migrations should be even faster. I also increased the migrationsWaitTimeout to 300s and still the same issue,
Based on my earlier comment, I have been able to get airflow up & running. All the database migrations happen good as well.
1. “Application is not running” on UI
On Qovery UI, it shows that “Application is not running”. However, I can see in k9s that it’s up and running and also port forward & login to airflow instance.
In this case, we only need to expose the webserver (pod name: helm-z97c7e1f9-aflow-airflow-webserver-6cb88dbcd-9fsh5 in the above k9s screenshot). So I have made the config updates as
Hi @0xbitmonk , completely unrelated but since you try deploying Airflow I just wanted to share my humble feedback - we were using Airflow at Qovery and we switched to Windmill. It’s an open-source alternative written in Rust, low-resources consumptions and much better than Airflow in my opinion. If you give it a shot I think you’ll like it.
You can even look at this thread where I explained how to install it (it’s easier than Airflow)
–
On a side note: someone will respond to you - sorry for the delay.
Yes, we used this thread to help us with Airflow installation We have tried Windmill and would eventually migrate to it, but for now, we have a bunch of systems running and don’t want to completely migrate. However, I am now more curious how did your migration looked like? We have a huge Python library written that powers the DAGs.
The migration was not too bad - actually it simplifies a lot the code since Windmill don’t ask you to start using stuff like DAG.(..) functions. It’s really smart and you write basic Python code and will execute it with inputs and outputs… All the magic is handled by Windmill.
Here is an example of data inserted into bigquery from one of our steps:
import os
import wmill
import json
from google.cloud import bigquery
from datetime import datetime, timezone, timedelta, date
#requirements:
#dependency
#wmill==1.133.0
#google-cloud-bigquery==3.12.0
def load_bq_credentials():
creds_content = wmill.get_resource("f/Product/BQ")
file_name = '/tmp/bqcreds.json'
with open(file_name, 'w') as f:
f.write(json.dumps(creds_content))
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = file_name
def main(data):
load_bq_credentials()
client = bigquery.Client()
table_ref = client.dataset("our_dataset").table("our_table")
rows_to_insert = []
for d in data:
if not d:
continue
# cf schema qovery_windmill.product_usage table
rows_to_insert.append({
"created_at_day": d["day"],
"created_at_timestamp": datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
"organization_id": d["organization_id"],
"payload_json": json.dumps(d)
})
errors = client.insert_rows_json(table_ref, rows_to_insert)
if errors != []:
print("an error occurred while {}".format(errors))
os.exit(1)
return data
So you can see it’s pure vanilla Python (minus the wmill lib that I use to load creds from windmill secrets manager).
You need to set the qovery labels and annotations in the deployment/statefulset.
Qovery can detect the pods only if those annotations/labels are present.
So you need to edit you values.yaml or chart in order to set thoses.
In the incoming weeks, we are going to release an automatic way to do it, but in the mean time, you need to set by hand those macro in the values.yaml