The Ops Community ⚙️

Cover image for [K8s] Fix Helm release failing with an upgrade still in progress
Ana Cozma
Ana Cozma

Posted on • Updated on

[K8s] Fix Helm release failing with an upgrade still in progress

This article applies to: Helm v3.8.0

Helm helps you manage Kubernetes applications — Helm Charts help you define, install, and upgrade even the most complex Kubernetes application. More details on Helm and the commands can be found in the official documentation.

Assuming you use Helm to handle your releases, you might end up in a case where the release will be stuck in a pending state and all subsequent releases will keep failing.

This could happen if:

  • you run the upgrade command from the cli and accidentally (or not) interrupt it or
  • you have two deploys running at the same time (maybe in Github Actions, for example)

Basically any interruption that occurred during your install/upgrade process could lead you to a state where you cannot install another release anymore.

In the release logs the failing upgrade will show an error similar to the following:

Error: UPGRADE FAILED: release <name> failed, and has been rolled back due to atomic being set: timed out waiting for the condition
Error: Error: The process '/usr/bin/helm3' failed with exit code 1
Error: The process '/usr/bin/helm3' failed with exit code 1
Enter fullscreen mode Exit fullscreen mode

And the status will be stuck in: PENDING_INSTALL or PENDING_UPGRADE depending on the command you were running.

Because of this pending state when you run the command to list all release it will return empty:

> helm list --all                                                                               
NAME    NAMESPACE   REVISION    UPDATED STATUS  CHART   APP VERSION
Enter fullscreen mode Exit fullscreen mode

So what can we do now? In this article we will look over the two options described below. Keep in mind that based on your setup it could be another issue, but I'm hoping that these two pointers will give you a place to start.

  1. Roll back to the previous working version using the helm rollback command.
  2. Delete the helm secret associated with the release and re-run the upgrade command.

So let's look at each option in detail.

First option: Roll back to the previous working version using the helm rollback command

From the official documentation:

This command rolls back a release to a previous revision.
The first argument of the rollback command is the name of a release, and the second is a revision (version) number. If this argument is omitted, it will roll back to the previous release.

So, in this case, let's get the history of the releases:

helm history <releasename> -n <namespace>
Enter fullscreen mode Exit fullscreen mode

In the output you will notice the STATUS of your release with: pending-upgrade:

REVISION UPDATED                  STATUS          CHART     APP VERSION DESCRIPTION
1        Wed May 25 11:45:40 2022 DEPLOYED        api-0.1.0 1.16.0      Upgrade complete
2        Mon May 30 14:32:46 2022 PENDING_UPGRADE api-0.1.0 1.16.0      Preparing upgrade
Enter fullscreen mode Exit fullscreen mode

Now let's perform the rollback by running the following command:

helm rollback <release> <revision> --namespace <namespace>
Enter fullscreen mode Exit fullscreen mode

So in our case we run:

> helm rollback api 1 --namespace api
Rollback was a success.
Enter fullscreen mode Exit fullscreen mode

After we get confirmation that the rollback was successful, we run the command to get the history again.

We now see we have two releases and that our rollback was successful having the STATUS is: deployed

> helm history api -n api

REVISION UPDATED                  STATUS      CHART     APP VERSION DESCRIPTION
1        Wed May 25 11:45:40 2022 SUPERSEEDED api-0.1.0 1.16.0      Upgrade complete
2        Mon May 30 14:32:46 2022 SUPERSEEDED api-0.1.0 1.16.0      Preparing upgrade
3        Mon May 30 14:45:46 2022 DEPLOYED    api-0.1.0 1.16.0      Rollback to 1
Enter fullscreen mode Exit fullscreen mode

So what if the solution above didn't work?

Second option: Delete the helm secret associated with the release and re-run the upgrade command

First we get all the secrets for the namespace by running:

kubectl get secrets -n <namespace>
Enter fullscreen mode Exit fullscreen mode

In the output you will notice a list of secrets in the following format:

NAME                        TYPE               DATA AGE
api                         Opaque             21   473d
sh.helm.release.v1.api.v648 helm.sh/release.v1 1    6d5h
sh.helm.release.v1.api.v649 helm.sh/release.v1 1    5d1h
sh.helm.release.v1.api.v650 helm.sh/release.v1 1    57m
Enter fullscreen mode Exit fullscreen mode

So what's in a secret?

Helm3 makes use of the Kubernetes Secrets object to store any information regarding a release. These secrets are basically used by Helm to store and read it's state every time we run: helm list, helm history or helm upgrade in our case.

The naming of the secrets is unique per namespace.
The format follows the following convention:
sh.helm.release.v1.<release_name>.<release_version>.

There is a max of 10 secrets that are stored by default, but you can modify this by setting the --history-max flag in your helm upgrade command.

--history-max int limit the maximum number of revisions saved per release. Use 0 for no limit (default 10)

Now that we know what these secrets are used for, let's delete the helm secret associated with the pending release by running the following command:

kubectl delete secret sh.helm.release.v1.<release_name>.v<release_version> -n <namespace>
Enter fullscreen mode Exit fullscreen mode

Finally, we re-run the helm upgrade command (either from command line or from your deployment workflow), which, if all was good so far, should succeed.

There is an open issue with Helm so hopefully these workaround won't be needed anymore. But it's open since 2018.

Of course there could be other cases or issues, but I hope this is a nice place to start. If you ran into something similar I would love to read your input in what the issue was and how you solved it especially since I didn't find the error message to be intuitive.

Thank you for reading!

Top comments (0)