Airbyte 1.1.0 installation on Qovery managed cluster broke again

data_admin · November 20, 2024, 12:22am

Airbyte 1.1.0 deployment on a Qovery managed EKS was working a few weeks ago. It broke again:

https://console.qovery.com/organization/5fafa1c2-689c-4a54-8ea1-533a7230a2a5/project/8fdd5013-c37f-45af-928a-a3b88179e1ef/environment/bbd9323d-6009-4ae4-af85-2026269e705d/logs/bdb5a18b-a560-4f33-98a0-c905c41105f8/deployment-logs

|Helm timed out for release helm-zbdb5a18b-airbyte during helm UPGRADE: Command killed due to timeout: Killing process AWS_ACCESS_KEY_ID="xxx" AWS_DEFAULT_REGION="us-east-1" AWS_SECRET_ACCESS_KEY="xxx" KUBECONFIG="/home/qovery/.qovery-workspace/bbd9323d-6009-4ae4-af85-2026269e705d-1-1732060600/bootstrap/z7d0953dd/qovery-kubeconfigs-z7d0953dd/z7d0953dd.yaml" "helm" "upgrade" "helm-zbdb5a18b-airbyte" "/home/qovery/.qovery-workspace/bbd9323d-6009-4ae4-af85-2026269e705d-1-1732060600/helm_charts/bdb5a18b-a560-4f33-98a0-c905c41105f8/chart" "--install" "-n" "zbbd9323d-airbyte-production" "--values" "/home/qovery/.qovery-workspace/bbd9323d-6009-4ae4-af85-2026269e705d-1-1732060600/helm_charts/bdb5a18b-a560-4f33-98a0-c905c41105f8/chart/file1" "--timeout" "600s" "--wait" "--atomic" "--debug" due to Timeout(600s)|
|198|20 Nov, 00:11:47.89|`❌ Deployment of helm chart failed !

https://console.qovery.com/organization/5fafa1c2-689c-4a54-8ea1-533a7230a2a5/project/8fdd5013-c37f-45af-928a-a3b88179e1ef/environment/bbd9323d-6009-4ae4-af85-2026269e705d/services/deployments

data_admin · November 20, 2024, 1:32am

Same problem with JupyterHub deployment. I followed the instructions here:

Deployment errors here:

https://console.qovery.com/organization/5fafa1c2-689c-4a54-8ea1-533a7230a2a5/project/0014bb51-b28a-4048-a385-262dab6ec1b5/environment/f025bac3-b0d4-4542-8158-ecf690dd13a7/logs/e20d6fbe-57d4-41fd-acb2-f997a9f3d8a6/deployment-logs

ready.go:284: [debug] PersistentVolumeClaim is not bound: zf025bac3-frontier-production/jupyterhub-hub-db-dir

data_admin · November 20, 2024, 1:33am

https://console.qovery.com/organization/5fafa1c2-689c-4a54-8ea1-533a7230a2a5/project/0014bb51-b28a-4048-a385-262dab6ec1b5/environment/f025bac3-b0d4-4542-8158-ecf690dd13a7/services/deployments

ce_gagnaire · November 20, 2024, 8:30am

Hello,

I’ll check your problem with the team and get back to you.

Regards,
Charles-Edouard

data_admin · November 20, 2024, 4:05pm

Thanks! I’d like to get both Airbyte and JupyterHub (latest versions of the Helm charts) deployed successfully. I’m trying a non-Karpenter enabled EKS cluster first. I was able to deploy both previously on ARM nodes, I’ll try AMD nodes today.

data_admin · November 20, 2024, 5:50pm

For what it’s worth, I spun up 3 Qovery managed EKS clusters:

no-karpenter AMD nodes
no-karpenter ARM nodes
karpenter AMD nodes

Airbyte 1.1.0 helm chart deployments are failing on all three of these clusters:
https://console.qovery.com/organization/5fafa1c2-689c-4a54-8ea1-533a7230a2a5/project/8fdd5013-c37f-45af-928a-a3b88179e1ef/environments/general

JupyterHub 4.0.0 Helm Chart deployments are failing on all 3 of the clusters:
https://console.qovery.com/organization/5fafa1c2-689c-4a54-8ea1-533a7230a2a5/project/0014bb51-b28a-4048-a385-262dab6ec1b5/environments/general

ce_gagnaire · November 21, 2024, 9:35am

Hello @data_admin ,

We checked your latest problem and it looks like you tried to deploy your application on a stopped Cluster.

Can you start your cluster and try again?

Please let me know if you still have errors after starting your clusters.

Regards,
Charles-Edouard

data_admin · November 21, 2024, 3:35pm

I wouldn’t make a mistake that basic. All clusters were live and operational when I tried to deploy the latest versions of Airbyte and JupyterHub on them. You can look at the deployment histories of Airbyte and JupyterHub to see the failing deployments. You can also duplicate the problem yourself by deploying Airbyte and JupyterHub onto your own EKS clusters.

ce_gagnaire · November 21, 2024, 4:42pm

Thank you for your feedback,

Do you mind if we try to start your cluster again to investigate the problem on your config?

Regards,
Charles-Edouard

data_admin · November 21, 2024, 4:51pm

I already did that, you can see the deployment errors here:

https://console.qovery.com/organization/5fafa1c2-689c-4a54-8ea1-533a7230a2a5/project/8fdd5013-c37f-45af-928a-a3b88179e1ef/environment/04771e89-0617-4c99-b627-025b8033df80/services/deployments

https://console.qovery.com/organization/5fafa1c2-689c-4a54-8ea1-533a7230a2a5/project/0014bb51-b28a-4048-a385-262dab6ec1b5/environment/103d13c4-6d71-4711-9478-8e0f86abe4c0/services/deployments

data_admin · November 21, 2024, 4:54pm

Airbyte 1.1.0 deployment on AWS EKS cluster (3 node AMD t4g.xlarge) was working 2 weeks ago. Now it’s failing the exact same setup:

https://console.qovery.com/organization/5fafa1c2-689c-4a54-8ea1-533a7230a2a5/project/8fdd5013-c37f-45af-928a-a3b88179e1ef/environment/6d101bde-aae9-45f3-a34f-95920eeeb215/services/deployments

Feel free to start / stop any clusters and jupyterhub / airbyte deployments if you need to debug.

Erebe · November 22, 2024, 8:45am

Hello,

I am looking at your issue.
One of the issue is that your cluster size is to small. Your have for example minio pod that can’t start because the cluster has reached its maximun number of nodes.
So you should give some extra room for the cluster to be able to expand, and increase the maximun number of nodes of the cluster.

The second issue is more on our side, where we set a hard timeout of 5 min to download of the depency of the chart, and it seems now that Airbyte is hitting this timeout all the time.

I am going to increase our hard timeout, and keep you in touch.

Erebe · November 22, 2024, 9:40am

Hi back we have increased our timeout for the dependency fetch, now the only thing left for you is to increase the cluster max node size. As the minio pod can’t start.

pod didn't trigger scale-up: 1 max node group size reached

data_admin · November 22, 2024, 6:10pm

I don’t think this makes sense; two weeks ago Airbyte deployment was working on clusters with 3 nodes.

I increase the number of nodes on our clusters to 5. Also have a karpenter enabled cluster:

https://console.qovery.com/organization/5fafa1c2-689c-4a54-8ea1-533a7230a2a5/clusters/general

Airbyte deployments are failing on all 3 clusters, you can check the deployment histories:

https://console.qovery.com/organization/5fafa1c2-689c-4a54-8ea1-533a7230a2a5/project/8fdd5013-c37f-45af-928a-a3b88179e1ef/environments/general

You may want to try to duplicate this problem on your own EKS clusters.

I’m just following these instructions:

Github repo was updated GitHub - evoxmusic/qovery-airbyte: Deploy Airbyte on Kubernetes with Qovery
working a few weeks ago

data_admin · November 26, 2024, 12:56am

Hello, any updates? Should I shut down our Qovery managed cluster if Qovery doesn’t have the bandwidth to debug this? You guys are welcome to re-deploy the cluster if you have the bandwidth to help debug this.

rophilogene · November 26, 2024, 5:34am

I did take a look and here are my notes:

First of all, stick to the version from my tutorial (1.1.x at the time) just to make sure I did have a valid working version.
I did try to deploy it on your EKS cluster with and without Karpenter. The error was the same - related to minio airbyte pod not starting. So I suspect a bug issue with this pod and I have a few ideas that I’ll explore today. (I suspect some metadata still present on the persistent storage for minio and blocking the proper startup of this service).

In the meantime, and for production purpose, I’d suggest using a s3 storage from AWS or equivalent to remove the dependency to minio (which is not recommended for production purpose).

You can find the documentation here: State and Logging Storage | Airbyte Documentation

data_admin · November 27, 2024, 1:01am

Hey that works for us - thanks! I was able to get the latest version of the Airbyte Helm chart v 1.2.0 deployed using S3:

EKS cluster without Karpenter:
https://console.qovery.com/organization/5fafa1c2-689c-4a54-8ea1-533a7230a2a5/project/8fdd5013-c37f-45af-928a-a3b88179e1ef/environment/72031385-01d8-444d-b25d-63177c6f2dc7/services/general
EKS cluster with Karpenter
https://console.qovery.com/organization/5fafa1c2-689c-4a54-8ea1-533a7230a2a5/project/8fdd5013-c37f-45af-928a-a3b88179e1ef/environment/e80b4911-d8a5-4d56-9477-6525e5ea49ec/services/general
You have to be careful with the node sizes if Karpenter is enabled I think, see: Scaling Airbyte | Airbyte Documentation

EDIT: I removed the above environments since both are working; will replace with a more permanent one later.

Don’t worry about the minio airbyte pod; we’re not going to use it in production. I’ll also try using an external AWS managed database service with Airbyte:

data_admin · November 27, 2024, 1:03am

Can you also help with deploying the JupyterHub Helm chart? I followed these instructions:

and the deployment failed:
https://console.qovery.com/organization/5fafa1c2-689c-4a54-8ea1-533a7230a2a5/project/0014bb51-b28a-4048-a385-262dab6ec1b5/environment/103d13c4-6d71-4711-9478-8e0f86abe4c0/services/general

There seems to be a
ready.go:284: [debug] PersistentVolumeClaim is not bound: z103d13c4-jupyterhub-production-amd/jupyterhub-hub-db-dir
bug that wasn’t there 2 weeks ago. I was able to deploy JupyterHub from the Helm chart successfully using the instructions above two weeks ago.

ce_gagnaire · November 27, 2024, 8:47am

Hello @data_admin ,

We are investigating this with the team.

Regards,
Charles-Edouard

bchastanier · November 27, 2024, 1:20pm

Hey @data_admin,

We have found the culprit, default storage class (gp2) is no tagged as default.
We are working on a viable fix.

In the meantime, I can patch your cluster setting gp2 by default, it should solve your issue for the time being.

Cheers

Topic		Replies	Views
Help setting up Airbyte and using Kubernetes EKS Questions and Answers qovery , aws	21	988	July 26, 2024
Cant redeploy Qovery managed cluster on Scaleway Questions and Answers qovery	4	39	July 17, 2024
Attempted Redeploying an already Qovery build container Deployment	7	782	March 25, 2024
Error installing Qovery BYOK on Exisiting EKS cluster: ebs-external-attacher-role AWS eks , terraform	4	205	February 2, 2024
Application Deployment failed: Helm Time out Deployment qovery , aws	2	441	March 25, 2024

Airbyte 1.1.0 installation on Qovery managed cluster broke again

Related topics