The Ops Community ⚙️

Javier Marasco
Javier Marasco

Posted on

What is Kubernetes, how does it works and why do we need it

A lot of people are getting into Kubernetes and the first thing they normally do is google "How to deploy an application to Kubernetes" which could make sense but then you will find thousands of articles (including the official documentation) explaining how to deploy an application, but then you see "deployment", "replica set", "pods", "resource quotas", "secrets" and everything starts getting confusing, complex and not making sense and then you get overwhelmed by the amount of information.

I believe the first thing to start with a technology that is new to you is to understand why such technology exists, what problems it solves, and how it works (conceptually), this will let you have a clear picture of the technology and will ultimately let you decide if it is the best fit for your particular case, so let's start with that.

A little of the history and background

Previous to Kubernetes applications used to be deployed in virtual machines (and previous to that into physical machines) but those machines needed to have libraries, dependencies, networking configurations, etc. You can imagine managing that was very complex and changes needed a long time and coordination between multiple teams. Then in 2013, there was a presentation that changed everything, a person called "Solomon Hykes" presented a new project his company (with two other persons) was working on, it was Docker.

The key difference with Docker was nothing new actually, was a set of capabilities that already existed in the Linux kernel for a long time but now it was being exposed to the application level in a more comprehensive way, with this "Docker" tool, it was possible to pack your code with all dependencies in a "container" which you could take to any other system where Docker was running and it will behave the same, that was exactly what was needed!

So now a few months/years had passed and you have containers everywhere, everyone is happy with the approach but the more containers you have it becomes a bit more complicated to manage, so we started using "docker-compose" which was a way to group containers into "logical units" to deploy them together and have some sense of integration between them, quickly it was obvious that there was a need for some way to manage large amounts of containers, then "Docker Swarm" appeared which was a tool to orchestrate the deployment of complex applications with multiple containers. This worked fine as an orchestrator but then in 2014 Google releases Kubernetes as an alternative to Docker Swarm but with more functionalities, more features were added with time and the community adopted Kubernetes as the de facto orchestrator, Kubernetes used Docker under the hood for a quite long time to execute the containers the same as Docker Swarm but later the community of Kubernetes decided to make it possible to use any container engine you want so they adjusted Kubernetes to support virtually any container engine removing the need to have Docker as the only container engine.

But... how does it work?

Kubernetes at the very top level is very simple in concept, you have your image (your application code and dependencies packed into a single file made of multiple "layers") and you will deploy it into a set of machines running "something" that will take your image and make it run this "something" will take care of checking your image is running in a healthy machine, will restart the image when something bad happens to it, will kill it and restart it if it starts consuming too many resources, it will handle communication in/out from the world to your image, etc.

To do this, Kubernetes has its own internal components, being them:

  • API Server
  • Scheduler
  • Kubelet
  • Kube proxy
  • etcd

The API Server is the interface between Kubernetes and the rest of the universe, every time you run a command using kubectl (the CLI to manage any Kubernetes cluster) it will make a REST call to the API server of your cluster and provide the parameters and files you pass to kubectl as the payload of the command.

The Scheduler is a process that will take the container (note I am not talking about an image anymore, more on this later) and will validate which node in the cluster meets the needs to have that container running (resources, exclusions, affinity, etc.)

Kubelet is a process running in each "worker" node that will do the actual work of making your container run, it will do all the needed tasks to ensure your code runs in the node, resource allocation, resource monitoring, process handling, etc. this is a key part as it also is responsible to support different container engines.

Kube proxy is a component also running on each worker node that will take care of all the networking work for our containers, one important piece of information about kube proxy that many people get confused about is that it will not route or be in the middle of the traffic at all, kube proxy does the needed networking configuration in the worker node for the traffic to reach the correct container (adding and removing rules from iptables for example) which means that it can crash and the applications running in your node will continue to work.

And of course, we have a lot of components in the control nodes and the worker nodes, we have container running, network routes defined and a lot of information about our infrastructure but how does the API server "remembers" all this? well, there is where etcd enters into the scene, etcd is a highly available and scalable key/value data store (what a lot of words, no worries, is not that complex) which takes the role to store all the cluster state constantly, it contains information about your cluster and the API server is the one updating it constantly.

Now you have a basic understanding of how is it possible for Kubernetes to run a container, by knowing this you can decide if Kubernetes is the right tool for your application.
I often tend to not recommend Kubernetes for small applications that are just in the initial stages of being deployed or if your application is already running in another infrastructure and the migration to Kubernetes will only bring you more work to do without much gain.

Just remember Kubernetes is a tool and like any other tool it serves a purpose and that should drive your intentions to move into it or not, adopting Kubernetes is a task that will require a lot of investigation, learning, and effort (translated into time), be mindful to move to Kubernetes only if it makes sense for your use case.

I hope this first article of the series helped you to understand the basics of Kubernetes and decide if Kubernetes is the right choice for your next steps.

If you enjoyed this article please consider following me to get alerted on every new article and consider leaving a message with your ideas and/or feedback.

Thank you for reading!

Top comments (0)