Ana Cozma

Posted on Jul 17, 2024 • Originally published at coffeewithana.cloud

Understanding and Mitigating the Latest OpenSSH Vulnerability (CVE-2024-6387) in AKS

#devops #azure #security #kubernetes

Recently a new vulnerability in OpenSSH has been identified and the first question that popped into my mind was: How do I make sure my nodes are not affected by this vulnerability?

In this blog post, I wanted to go over what the vulnerability is, how it can be exploited, explain how you can check if your Azure Kubernetes Service (AKS) is vulnerable to CVE-2024-6387 and what you can do about it, including different options for upgrading the VMSS image and how to choose between them.

Understand the vulnerability

CVE-2024-6387

CVE-2024-6387 is a critical unauthenticated RCE-as-root vulnerability that was identified in the OpenSSH server, sshd, in glibc-based Linux systems. If exploited, this vulnerability grants full root access, affects the default configuration and does not require user interaction thus it is classified as a High Severity.

This was identified on the 1st of July 2024.

The researchers who discovered it also noted that in 2006 OpenSSH faced this vulnerability known as CVE-2006-5051. While the 2006 one was patched, the bug has reappeared. This is why the latest, CVE-2024-6387, vulnerability is dubbed the "regreSSHion bug": we see a reintroduction of an issue that was fixed due to code changes.

CVE-2024-6387 vulnerability impacts the following OpenSSH server versions:

Open SSH version between 8.5p1 - 9.8p1 (excluding)
Open SSH versions earlier than 4.4p1, if they’ve not backport-patched against CVE-2006-5051 or patched against CVE-2008-4109

CVE-2024-6409

As of the 9th of July another vulnerability has been discovered: CVE-2024-6409.

This is a distinct vulnerability from the regreSSHion bug. The vulnerability allows an attacker to execute code within the privsep child process. This child process is a part of OpenSSH that runs with restricted privileges to limit the damage that can be done if it is compromised.

The vulnerability is caused by a race condition related to how signals are handled. This means that the privsep child process can be exploited because the timing of signal handling operations can be manipulated, leading to unintended behavior that allows code execution.

Impact OpenSSH versions 8.7p1 and 8.8p1 shipped with Red Hat Enterprise Linux 9.

Machines patched for CVE-2024-6387 will also be patched for CVE-2024-6409.

Suggested actions against the vulnerability

To protect against this vulnerability the main suggestion is to upgrade the package version using a command like or similar to apt upgrade opensshh-sftp-server, but if you cannot do this and you need a quick workaround then an option would be to set the LoginGraceTime SSH configuration parameter to 0 as recommended by Ubuntu.

Let's look into both recommendations and understand them a bit more and let's start with the workaround:

Set LoginGraceTime to 0

OpenSSH allows remote connections to the server machines. LoginGraceTime SSH server configuration parameter specifies the time allowed for successful authentication to the server.

This means that setting a longer Grace time period allows for more open unauthenticated connections to be made. Setting a shorter Grace time period can protect against a brute force attack in certain cases.

In the context of the identified vulnerability, this is important because the vulnerable code is called only when the LoginGraceTime timer triggers. So the reasoning is that by setting it to 0, which means no timeout, you prevent the timer from firing, the code will not be called and thus the vulnerability is eliminated.

But there is a caveat here.

While you eliminate the risk of calling the vulnerable code, and you are protected against brute force attacks, by setting this to 0 you are making sshd vulnerable to denial of service attacks. So it's good to consider your options carefully and the tradeoff when you are configuring these settings.

Denial of Service through MaxStartups Exhaustion Explained

MaxStartups is another sshd configuration that limits the number of concurrent unauthenticated connections.

If LoginGraceTime is set to 0, attackers can open numerous connections without being timed out. Since these connections won't be closed due to timeout, they will remain open indefinitely.

This can exhaust the allowed number of connections specified by MaxStartups, preventing legitimate users from accessing the SSH service.

Essentially, the server becomes overwhelmed with these open connections, leading to a denial of service for legitimate users (hence the denial of service).

This is why the main recommendation is to upgrade to a patched version of sshd where the underlying vulnerability has been addressed. This ensures that LoginGraceTime can be set to a reasonable value, and the server can handle connection attempts appropriately without being vulnerable to a DoS attack via MaxStartups exhaustion.

Upgrade to a patched version of `sshd`

Now onto the main fix and what this means for your virtual machine scale sets (VMSS) in the AKS context. When running AKS, modifying the VMSS yourself is generally not recommended due to the following reasons:

Managed Service: AKS is a managed Kubernetes service, meaning Microsoft handles most of the underlying infrastructure management for you. Directly modifying VMSS configurations can interfere with the automated management and updates provided by AKS.
Configuration Consistency: AKS maintains certain configurations to ensure the cluster operates correctly. Manual modifications to the VMSS could lead to a configuration drift, where the manually set configurations diverge from the managed state AKS expects and maintains.
Stability and Reliability: Direct modifications can lead to instability or unexpected behavior within your cluster. This includes potential issues during upgrades, scaling operations, or applying patches.

Because of these reasons handling the fix for the vulnerability means waiting for the Azure release team to provide us with a patched image.

Check the AKS version

When you upgrade Kubernetes it also upgrades the node images so a good place to start is to identify the version of Kubernetes your AKS clusters are running. You can do this through the Azure portal, CLI, or API.

Azure Portal:

Navigate to your AKS cluster resource and check the version information in the Overview section.

Azure CLI:

az aks show --resource-group <ResourceGroupName> --name <AKSClusterName> --query kubernetesVersion

Note: Replace ResourceGroupName and AKSClusterName with your actual resource group and AKS cluster names.

Then by making use of kubectl command line, you can retrieve the exact version of the node images you are using:

kubectl get nodes -o wide

By running these commands you will know your Kubernetes version and also the OS Image version your nodes are running on. Now you can compare your node image version against the versions mentioned in the CVE details as vulnerable to know if you are running the nodes on an image that has a vulnerable version of sshd.

Check and upgrade the AKS VMSS node image

Identify the patched image version

Azure Kubernetes Service regularly provides new node images, so it's good to upgrade your node images frequently to take advantage of the latest AKS features. Linux node images are updated weekly, and Windows node images are updated monthly.

For Azure, and AKS more specifically, you should perform the following checks:

Check for the node image with a patched sshd version on GitHub Azure AKS Releases
Check the rollout schedule of the patched node image in your region AKS Release Status page

***Tip: It is also a good practice in general to check the release page for announcements on upcoming releases and the fixes they include and keep your node images up to date to protect against the latest vulnerabilities.*

At the time of the writing of the current article, we'll be looking out for the rollout of the image with version: 202407.08.0.

Generally, when you upgrade the Kubernetes version the images will be upgraded as well, but when you have a security patch you might want to upgrade only the image and not the Kubernetes version.

Please consider carefully before upgrading a node image version because it's not possible to downgrade it afterward!

Verify the patched image version availability

In order to check for available node image upgrades for the nodes in your node pool simply run the following command:

az aks nodepool get-upgrades --nodepool-name mynodepool --cluster-name myAKSCluster --resource-group myResourceGroup

In the JSON output, check the latestNodeImageVersion parameter which indicates the version of the latest image available that the nodes can be upgraded to.

Then, you want to check the actual node image you are running on (can be done via Azure Portal or CLI). If you're using CLI for this command as well then just run:

az aks nodepool show --resource-group myResourceGroup --cluster-name myAKSCluster --name mynodepool --query nodeImageVersion

Simply compare the two image versions. If there is a difference this means there is an upgrade available for your nodes. If not, you are already running on the latest and you should check the releases for the rollout of the image you are interested in upgrading to.

Having the image version available in your region, the next step will be performing the actual node image upgrade. There are several ways of handling this depending on your scenario which I will detail below.

Upgrade all node images in all node pools

TL;DR

CLI Command: az aks upgrade --node-image-only \
Scope: This command applies the upgrade to all node pools in the specified AKS cluster. \
Use Case: Use this when you want to ensure that all nodes in your entire cluster are updated to the latest node image version.

How To

Use the az aks upgrade command with the --node-image-only flag to upgrade the node images across all node pools in the AKS cluster. This command ensures that only the node image is upgraded without altering the Kubernetes version.
```
az aks upgrade \
    --resource-group myResourceGroup \
    --name myAKSCluster \
    --node-image-only
```
After initiating the upgrade, you can verify the status of the node images using the kubectl get nodes command with a specific JSONPath query to output the node names and their image versions.
```
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.labels.kubernetes\.azure\.com\/node-image-version}{"\n"}{end}'
```
Once the upgrade is complete, you can retrieve the updated details of the node pools, including the current node image version, using the az aks show command.
```
az aks show \
    --resource-group myResourceGroup \
    --name myAKSCluster
```

Upgrade a specific node pool

TL;DR

CLI Command: az aks nodepool upgrade --node-image-only\
Scope: This command targets a specific node pool within the AKS cluster, identified by the --name parameter.\
Use Case: Use this when you need to upgrade the node image for only one particular node pool, perhaps for testing or staggered rollout purposes.

How To

If you want to upgrade the node image of a specific node pool without affecting the entire cluster, use the az aks nodepool upgrade command with the --node-image-only flag.

az aks nodepool upgrade \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name mynodepool \
    --node-image-only

Similar to the cluster-wide upgrade, check the status of the node images with the kubectl get nodes command.

kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.labels.kubernetes\.azure\.com\/node-image-version}{"\n"}{end}'

Use the az aks nodepool show command to get the details of the updated node pool.

az aks nodepool show \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name mynodepool

Use Node Surge to Speed Up Upgrades

TL;DR

CLI Command: az aks nodepool update --max-surge \
Scope: This command also targets a specific node pool but includes the --max-surge parameter to control the number of extra nodes that can be created to expedite the upgrade. \
Use Case: Use this when you want to perform a faster upgrade of a node pool by temporarily increasing the number of nodes during the upgrade process, thereby reducing downtime or upgrade duration.

How To

To speed up the node image upgrade process, you can use the az aks node pool update command with the --max-surge flag, which specifies the number of extra nodes used during the upgrade process. This allows more nodes to be upgraded simultaneously.
```
az aks nodepool update \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name mynodepool \
    --max-surge 33% \
    --no-wait
```

Check the node image status as previously described.

kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.labels.kubernetes\.azure\.com\/node-image-version}{"\n"}{end}'

Retrieve the updated node pool details using the az aks node pool show command.

az aks nodepool show \
    --resource-group myResourceGroup \
    --cluster-name myAKSCluster \
    --name mynodepool

Conclusion

The choice between the three will depend on what your strategy will be and what you want to focus on:

If you have a new security patch or critical update and want every node in your cluster to be updated as quickly as possible without specifying individual node pools, upgrade the entire cluster.
If you are running different workloads on separate node pools and want to update the node image for only one specific pool to test compatibility or performance just target upgrade.
If you need a faster upgrade for a specific node pool and can afford to temporarily add more nodes to handle the upgrade process, use node surge.

I hope this article will give you an idea of this particular security vulnerability and how you can mitigate it and how you can approach security patches in the future in the context of AKS VMSS. Thank you for reading!

Top comments (1)

jack • Jun 28 '25

This post is helpful for those people who learn about a new problem in OpenSSH that hackers can use. It also shows how to fix it in AKS to stay safe. Good info for anyone using cloud services!

Understand the vulnerability

CVE-2024-6387

CVE-2024-6409

Suggested actions against the vulnerability

Set LoginGraceTime to 0

Upgrade to a patched version of sshd

Check the AKS version

Check and upgrade the AKS VMSS node image

Identify the patched image version

Verify the patched image version availability

Upgrade all node images in all node pools

Upgrade a specific node pool

Use Node Surge to Speed Up Upgrades

Conclusion

Upgrade to a patched version of `sshd`