The Ops Community

Cover image for Troubleshooting Pods Stuck in "Pending" State
Patrick Londa for Blink Ops

Posted on • Originally published at blinkops.com

Troubleshooting Pods Stuck in "Pending" State

So you’re using Kubernetes to manage your containerized services, but you’ve run into a snag. Your project isn’t loading, and the pods are stuck in a pending state. Fortunately, Kubernetes has helpful debugging tools that can readily streamline the troubleshooting process. Use this step-by-step guide to troubleshoot Kubernetes pods stuck in a pending state.

What Does “Pending” Mean?

Kubernetes pods are left pending if they can’t be scheduled to a node. The kubectl describe pods command should display messages from the scheduler explaining why your pod can’t be scheduled to a node.

How Does a Pod Become Stuck in a “Pending” State?

There are two common reasons for a pod to fail to be scheduled to a node. First, it may be bound to hostPort. Second, you may have insufficient resources (usually memory or CPU).

Troubleshooting Pods Stuck in “Pending”

Now that you understand more about “stuck pods”, follow these steps to manually troubleshoot a Kubernetes pod stuck in a pending state.

Step 1: Diagnosing the Issue

The first step in any kind of Kubernetes troubleshooting is to run the command: kubectl describe pods

This command will return a basic description of each of your pods, including their state. In the output, you’ll also be able to see if you have reached CPU, memory, or network limits. This is one of the most likely reasons for a pod remaining in the “pending” state.

Step 2: Scale out, scale up

If you have reached resource limits, then you can increase capacity by scaling out or scaling up.

You scale out by adding more worker nodes to the cluster. You can do this in a variety of ways depending on which cloud infrastructure you are using. As a starting point, here is a basic kubernetes guide on how to add nodes to an existing cluster.

To scale up, you instead need to increase the node memory or CPU on your existing nodes.

Step 3. Reduce Your Resource Requests

If you don’t want to add capacity by scaling out or scaling up, another option is to reduce your existing resource requests. You can make this change by editing the following configuration arguments in your manifest YAML file.

  • spec.containers[].resources.requests.cpu
  • spec.containers[].resources.requests.memory
  • spec.containers[].resources.requests.hugepages-<size>

After you apply these changes, you will have reduced the resources needed on deployment.

Another option that has a similar effect is to remove unneeded deployments and resources to free up space. Cleaning up your resources is a good regular practice regardless of running into errors like this, since it can reduce costs.

Discussion (0)