[Karpenter] - Error CoreDNS installation

Mike · September 12, 2024, 7:08am

Hello

I’m trying to pass our cluster with the excellent new feature using Karpenter

But in my case I have an error on CoreDNS:

I’m thinking about delete and recreate CoreDNS add-on but I’m not confident on what could happen on my cluster.

My actual node groups:

{
    "nodegroups": [
        "qovery-20240827181650210400000001"
    ]
}

The description:

{
    "nodegroup": {
        "nodegroupName": "qovery-20240827181650210400000001",
        "nodegroupArn": "arn:aws:eks:eu-west-1:XXXXXXXXX:nodegroup/qovery-zdc37d137/qovery-20240827181650210400000001/68c8ca87-047e-d567-8d65-6ba9b6576467",
        "clusterName": "qovery-zdc37d137",
        "version": "1.28",
        "releaseVersion": "1.28.11-20240817",
        "createdAt": "2024-08-27T20:16:54.093000+02:00",
        "modifiedAt": "2024-09-12T09:06:52.298000+02:00",
        "status": "ACTIVE",
        "capacityType": "ON_DEMAND",
        "scalingConfig": {
            "minSize": 3,
            "maxSize": 30,
            "desiredSize": 5
        },
        "instanceTypes": [
            "t3a.medium"
        ],
        "subnets": [
            "subnet-042c467f2128ea259",
            "subnet-05a060b5f0ce709fd",
            "subnet-0e6eb8f44e172f983",
            "subnet-0c720f7aa0917d36e",
            "subnet-07b54265ebef2b6a3",
            "subnet-088e44d3cae56e0dd"
        ],
        "amiType": "AL2_x86_64",
        "nodeRole": "arn:aws:iam::XXXXXXXX:role/qovery-eks-workers-zdc37d137",
        "labels": {},
        "resources": {
            "autoScalingGroups": [
                {
                    "name": "eks-qovery-20240827181650210400000001-68c8ca87-047e-d567-8d65-6ba9b6576467"
                }
            ]
        },
        "health": {
            "issues": []
        },
        "updateConfig": {
            "maxUnavailablePercentage": 10
        },
        "launchTemplate": {
            "name": "terraform-20230502090830471400000001",
            "version": "5",
            "id": "lt-00870d69fbdb99527"
        },
        "tags": {
            "QoveryNodeGroupId": "zdc37d137-1",
            "ClusterId": "zdc37d137",
            "QoveryProduct": "EKS",
            "Region": "eu-west-1",
            "Service": "EKS",
            "OrganizationLongId": "cf7a89a0-7fe4-4096-9f62-a99aa7dd3f21",
            "OrganizationId": "zcf7a89a0",
            "creationDate": "2023-05-02T09:08:29Z",
            "QoveryNodeGroupName": "default",
            "ClusterLongId": "dc37d137-c921-4f9b-88e2-a0e26c461b42"
        }
    }
}

Thanks for your help

Pierre_Gerbelot · September 12, 2024, 7:12am

Hello @Mike
Can you share the web console url of your cluster?

Thank you

Mike · September 12, 2024, 7:13am

Hello @Pierre_Gerbelot

Yes sorry: https://console.qovery.com/organization/cf7a89a0-7fe4-4096-9f62-a99aa7dd3f21/cluster/dc37d137-c921-4f9b-88e2-a0e26c461b42/logs

Pierre_Gerbelot · September 12, 2024, 7:37am

I’m investigating the issue. I will let you know once it has been fixed

Pierre_Gerbelot · September 12, 2024, 9:20am

The cluster update has been fixed, and your cluster is now running Karpenter.

You can redeploy your entire environments.

We will work on making the CoreDNS update more robust in the next release.

Thank you.

Mike · September 12, 2024, 9:27am

Thank a lot for your reactivity @Pierre_Gerbelot

wewelll · September 12, 2024, 2:04pm

Hi @Pierre_Gerbelot

I have the same issue on my cluster, could you help me ?

The link to my console is : https://console.qovery.com/organization/fcdd956b-60bb-4d2c-906c-80d7ac3bb53d/cluster/b1193cd7-bbd6-4fbd-a6b7-c7ee0062c121/logs

Thanks,
Samuel

Pierre_Gerbelot · September 12, 2024, 2:09pm

Hello @wewelll ,
I’m looking into the issue, and I will let you know once it is fixed.
Thank you

wewelll · September 12, 2024, 3:37pm

thank you for your reactivity @Pierre_Gerbelot ! I see that you have launched a new deployment of the cluster, and there is a new issue, the iam-eks-user-mapper deployment is timing out.

Also I can’t reach my deployed apps anymore …

bchastanier · September 12, 2024, 3:39pm

Hey @wewelll,

Yes, sorry for the inconvenience, there is an issue with CNI addon, we are working on making your cluster available again.

Pierre_Gerbelot · September 12, 2024, 4:52pm

Your cluster has been fixed, even though the status shows an error in the web console.

We have locked your cluster until Monday, when we will apply a definitive correction.

We encountered an issue with Datadog that prevented all pods from starting. We were forced to delete the Datadog mutating webhook configuration because APM is activated by default. We saw you have tried to update the datadog helm chart. Please try again to re-install the mutating webhook.

Sorry for the inconvenience

Notice: For users who encounter the same issue, please wait until Monday (09/16/24) when a fix will be delivered.

Thank you.

wewelll · September 13, 2024, 9:07am

Thank you for your answer.

Yes I’m using mutateUnlabelled: true on Datadog charts because at the time I set it up, I could not label each service with the admission controller.

I could get rid of the mutateUnlabelled: true now

I redeployed the Datadog charts and it worked, but now I have another issue when deploying my applications, I see an ImagePullBackOff error with message Failed to pull image .... no match for platform in manifest: not found

Pierre_Gerbelot · September 13, 2024, 10:28am

Hello @wewelll
When you migrated to Karpenter, I believe you selected ARM64 as the architecture, whereas your node group was previously configured with AMD64 nodes.

Now, when you deploy your applications, the ARM64 architecture is being used, but the image you’re using is not compatible with this architecture. This is causing the ImagePullBackOff error.

Was this architecture change made on purpose?

wewelll · September 13, 2024, 10:33am

No it was a mistake on my side, I need AMD64 nodes.

Is it possible to update the architecture ? I can’t do it from the console because my cluster is locked …

Pierre_Gerbelot · September 13, 2024, 10:39am

Now that you have changed the architecture in the cluster settings, I think you should redeploy all the environments running on this cluster. (also datadog)

Thank you

system · September 20, 2024, 10:39am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unable to create cluster with karpenter AWS qovery , aws-ec2 , aws	2	49	September 24, 2024
Possible issue when deploying datadog cluster agent in karpenter Questions and Answers	5	57	October 28, 2024
Cluster Deployment Failure Questions and Answers kubernetes	7	131	April 25, 2024
Error: ImagePullBackOff on deployement (trying and failing to pull image) Questions and Answers	6	316	March 25, 2024
Canceled deployment every time I try to redeploy them Questions and Answers	23	756	March 25, 2024

[Karpenter] - Error CoreDNS installation

Related topics