Helm upgrade failed : "Helm timed out" on cluster

Hello,

I have an issue with this cluster : 4971bac2-eb84-45a7-b369-a13e9e225650

Some context :
I don’t trigger any Helm chart update
I tried to stop and re-start the cluster, but I have the same error

I have a Datadog’s service running on Helm chart
Does this have anything to do with this container ?

Error message :
Helm timed out for release vertical-pod-autoscaler during helm UPGRADE: Error: UPGRADE FAILED: an error occurred while rolling back the release. original upgrade error: context deadline exceeded: release vertical-pod-autoscaler failed: context deadline exceededrelease vertical-pod-autoscaler failedhelm.sh/helm/v3/pkg/action.(*Rollback).performRollback helm.sh/helm/v3/pkg/action/rollback.go:220helm.sh/helm/v3/pkg/action.(*Rollback).Run helm.sh/helm/v3/pkg/action/rollback.go:79helm.sh/helm/v3/pkg/action.(*Upgrade).failRelease helm.sh/helm/v3/pkg/action/upgrade.go:490helm.sh/helm/v3/pkg/action.(*Upgrade).reportToPerformUpgrade helm.sh/helm/v3/pkg/action/upgrade.go:354helm.sh/helm/v3/pkg/action.(*Upgrade).releasingUpgrade helm.sh/helm/v3/pkg/action/upgrade.go:414runtime.goexit runtime/asm_amd64.s:1598an error occurred while rolling back the release. original upgrade error: context deadline exceededhelm.sh/helm/v3/pkg/action.(*Upgrade).failRelease helm.sh/helm/v3/pkg/action/upgrade.go:491helm.sh/helm/v3/pkg/action.(*Upgrade).reportToPerformUpgrade helm.sh/helm/v3/pkg/action/upgrade.go:354helm.sh/helm/v3/pkg/action.(*Upgrade).releasingUpgrade helm.sh/helm/v3/pkg/action/upgrade.go:414runtime.goexit runtime/asm_amd64.s:1598UPGRADE FAILEDmain.newUpgradeCmd.func2 helm.sh/helm/v3/cmd/helm/upgrade.go:209github.com/spf13/cobra.(*Command).execute github.com/spf13/cobra@v1.6.1/command.go:916github.com/spf13/cobra.(*Command).ExecuteC github.com/spf13/cobra@v1.6.1/command.go:1044github.com/spf13/cobra.(*Command).Execute github.com/spf13/cobra@v1.6.1/command.go:968main.main helm.sh/helm/v3/cmd/helm/helm.go:83runtime.main runtime/proc.go:250runtime.goexit runtime/asm_amd64.s:1598: Command terminated with a non success exit status code: exit status: 1

Hello @Simon_Pera
Here is the error I see :

0/3 nodes are available: 3 Insufficient memory. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod…

Could you try to add another node in the setting of your cluster and update your cluster?

Regards

Hello @Pierre_Gerbelot,

Can you tell me where do you see this error ? I only can see the error above in my cluster logs…

Do you mean that 3 nodes of C5.large is not enough for update ?
I don’t under why in this case the resource is not enough, it is because I don’t allow scaling (as I’m setting min 3 instance and max 3 instance) ?

Thank you in advance

Hello @Simon_Pera

Unfortunately, this error is not available in the cluster log yet. It is only available in the deployment log for the moment.
I think your cluster is close to the limit of resources, and the update of the cluster was failing when your applications were running but succeeded when the applications were paused.

Indeed, if you want to be able to allocate more nodes in your cluster, you have to change the max number of instances in the cluster settings

Regards