Originally developed by Google, Kubernetes was created as a way to manage application components known as containers. Sometimes referred to as K8s, Kubernetes is Greek for "pilot" — a fitting name for a system that directs, schedules, and regulates how each subsystem operates, and how subsystems communicate with each other. But what happens when the pilot fails to direct digital traffic, and how can the ship be prepared to avoid a crash?
Running a Kubernetes global health checklist can go a long way in preventing errors before they cause disruptions, and can optimize container performance according to current scalability needs. Here's why monitoring your Kubernetes cluster health should be a part of your DevOps strategy, and what you can do to keep your container management system afloat.
Consisting of a master node, at least one worker node, and all of the containers and pods inside, a cluster comprises an entire workload for a given app development team — or for the entire project. The ability to configure multiple clusters according to the needs of each department enables developers to optimize the resources that they invest into creating their apps.
For example, a machine learning application may require a graphics processing unit (GPU) to function, which would not be necessary for other operations like web service. Configuring a Kubernetes cluster to the needs of each department would enable developers to use only the resources they need for each project, and none that they don't. That means failure to customize the operation of each Kubernetes cluster can result in suboptimal configuration, which can hinder app development.
Following a Kubernetes global health checklist can help DevOps teams monitor their clusters' health, ensuring that each one runs at optimum capacity. Here are a few cluster events to watch for:
Both cluster nodes and pods have minimum and maximum amounts of CPU and memory usage that they can consume. The minima are called requests, which impact the scheduler as it uses requests to select pods for eviction from a node under pressure. The maxima, called limits, are used at the container runtime level. They prevent the container from using more than that limit, ending in a CrashLoop most of the time.
CPU ranges are considered compressible, so exceeding them will only cause container usage to be throttled. Memory is the amount of data consumed by each container, so containers operating outside the request and limit range will be terminated. Therefore, it is important to assign an appropriate request and limit range for both CPU and memory usage to each pod within a cluster. Otherwise, a container may be throttled or terminated.
Once you have established the request and limit ranges for both the CPU and memory use, it is important to identify how much is consumed by each node and pod. This can be done by evaluating three parameters for both CPU and memory use: percent usage, percent requested, and percent limits.
A low usage rate means that you have allotted more computing power or data than needed, and could save by scaling back your limit. A higher usage percentage means that you may be operating close to full capacity, and could struggle to scale or keep up with greater loads. If the percent limit is lower than percent requested, then you may not have assigned a limit to all of your pods.
The Kubernetes metric, kube_node_status_allocatable, helps developers identify how many additional pods can be added based on current CPU and memory usage trends. That way, developers will know how much room they have to scale.
In addition to making sure that each node and pod is operating within the assigned computing limits, DevOps teams should also keep pods relatively evenly distributed across all nodes.
An uneven distribution can result in some loads being overloaded and their containers possibly terminated, while the computing power available in other nodes goes unused. This can be due to node affinity, where a certain property like GPU possession or security features causes a disproportionate number of pods to be scheduled to it. Conversely, some node features called taints may repel pod assignment, leaving them with fewer pods than their capacity allows.
To get the most out of the computing power available, check your affinity settings to make sure no pods are disproportionately scheduled to certain nodes.
The Kubernetes server has three API endpoints that can be used during a global health check. They are:
Healthz, which determines if the app is running, but this has been deprecated since v1.16
Livez, which can be used with the flag --livez-grace-period to determine startup duration
Readyz, which re-launches containers if they are terminated
If a machine checks the healthz / livez /readyz of the API server, it should examine the HTTP status code, as a status code 200 indicates the API server is healthy / live / ready, depending on the called endpoint.
When developers want to manually debug the status of the API server, they can run this command with the verbose parameter:
kubectl get --raw='/readyz?verbose'
The output then shows the full status details for the endpoint:
healthz check passed
For more information on this type of debugging, you can read more here.
Keeping your Kubernetes cluster at optimum performance will prevent you from wasting allotted computing power, and valuable business resources too. It will also improve scalability, and will enhance efficiency across the board. Integrate this Kubernetes global health checklist into your DevOps strategy, and improve your applications today.
Turning one-off health checks into a standard routine for your organization takes some time to think through. One simple way to get started is by creating a free Blink account. You’ll immediately gain access to automations that are designed to run checks just like this. Just connect your cluster and schedule checks. It’s that simple.
Get started and create your free Blink account today.