The Ops Community ⚙️

Cover image for CPU requests and limits in Kubernetes
Daniele Polencic
Daniele Polencic

Posted on

CPU requests and limits in Kubernetes

In Kubernetes, what should I use as CPU requests and limits?

Popular answers include:

  • Always use limits!
  • NEVER use limits, only requests!
  • I don't use either; is it OK?

Let's dive into it.

In Kubernetes, you have two ways to specify how much CPU a pod can use:

  1. Requests are usually used to determine the average consumption.
  2. Limits set the max number of resources allowed.

The Kubernetes scheduler uses requests to determine where the pod should be allocated in the cluster.

Since the scheduler doesn't know the consumption (the pod hasn't started yet), it needs a hint.

But it doesn't end there.

The Kubernetes scheduler uses requests to decide how to assign a pod to a node

CPU requests are also used to repart the CPU to your containers.

Let's have a look at an example:

  • A node has a single CPU.
  • Container A has requests equal to 0.1 vCPU.
  • Container B has requests equal to 0.2 vCPU.

What happens when both containers try to use 100% of the available CPU?

CPU usage in two containers

Since the CPU request doesn't limit consumption, both containers will use all available CPUs.

However, since container B's request is doubled compared to the other, the final CPU distribution is: Container 1 uses 0.3vCPU and the other 0.6vCPU (double the amount).

Both container use all available CPU but they keep proportional quotas

Requests are suitable for:

  • Setting a baseline (give me at least X amount of CPU).
  • Setting relationships between pods (this pod A uses twice as much CPU as the other).

But do not help set hard limits.

For that, you need CPU limits.

When you set a CPU limit, you define a period and quota.

Example:

  • period: 100000 microseconds (0.1s).
  • quota: 10000 microseconds (0.01s).

I can only use the CPU for 0.01 seconds every 0.1 seconds.

That's also abbreviated as "100m".

Quota and period in CPU limits

If your container has a hard limit and wants more CPU, it has to wait for the next period.

Your process is throttled.

A process being CPU throttled

So what should you use as CPU requests and limits in your Pods?

A simple (but not accurate) way is to calculate the smallest CPU unit as:

REQUEST = NODE_CORES * 1000 / MAX_NUM_PODS_PER_NODE
Enter fullscreen mode Exit fullscreen mode

For a 1 vCPU node and a limit of 10 Pods, that's a 1 * 1000 / 10 = 100Mi request.

Assign the smallest unit or a multiplier of it to your containers.

Assigning CPU requests to Pods and containers

For example, if you don't know how much CPU you need for Pod A, but you identified it is twice as Pod B, you could set:

  • Request A: 1 unit
  • Request B: 2 units

If the containers use 100% CPU, they repart the CPU according to their weights (1:2).

Two pods competing for CPU resources

A better approach is to monitor the app and derive the average CPU utilization.

You can do this with your existing monitoring infrastructure or use the Vertical Pod Autoscaler to monitor and report the average request value.

Vertical Pod Autoscaler with Goldilocks

How should you set the limits?

  1. Your app might already have "hard" limits. (Node.js is single-threaded and uses up to 1 core even if you assign 2).
  2. You could have: limit = 99th percentile + 30-50%.

You should profile the app (or use the VPA) for a more detailed answer.

99th percentile for CPU

Should you always set the CPU request?

Absolutely, yes.

This is a standard good practice in Kubernetes and helps the scheduler allocate pods more efficiently.

Should you always set the CPU limit?

This is a bit more controversial, but, in general, I think so.

You can find a deeper dive here: https://dnastacio.medium.com/why-you-should-keep-using-cpu-limits-on-kubernetes-60c4e50dfc61

Also, if you want to dig in more a few relevant links:

And finally, if you've enjoyed this thread, you might also like:

Top comments (1)

Collapse
 
dejanualex profile image
Alexandru Dejanu • Edited

Very interesting article, from my perspective the aim is to find the actual load, so for me a feasible approach is to set requests=limits, then run stress tests, and afterwards readjust.