The Ops Community ⚙️

Cover image for How etcd works in Kubernetes
Daniele Polencic
Daniele Polencic

Posted on

How etcd works in Kubernetes

If you've ever interacted with a Kubernetes cluster in any way, chances are it was powered by etcd under the hood.

But even though etcd is at the heart of how Kubernetes works, it's rare to interact with it directly daily.

In this article, you will explore how it works!

Architecturally speaking, the Kubernetes API server is a CRUD application that stores manifests and serves data.

Hence, it needs a database to store its persisted data, which is where etcd fits into the picture.

Kubernetes control plane

According to its website, etcd is:

  1. Strongly consistent.
  2. Distributed.
  3. Key-value store.

In addition, etcd has another feature that Kubernetes extensively uses: change notifications.

Etcd allows clients to subscribe to changes to a particular key or set of keys.

Key features of etcd

The Raft algorithm is the secret behind etcd's balance of strong consistency and high availability.

Raft solves a particular problem: how can multiple processes decide on a single value for something?

Raft works by electing a leader and forcing all write requests to go to it.

The Raft algorithm

How does the Leader get elected, though?

First, all nodes start in the Follower state.

All nodes start in the follower state

If followers don't hear from a leader, they can become candidates and request votes from other nodes.

Followers can be become candidate

Nodes reply with their vote.

The candidate with the majority of the votes becomes the Leader.

Changes are then replicated from the Leader to all other nodes; if the Leader ever goes offline, a new election is held, and a new leader is chosen.

A candidate becomes a leader

What happens when you want to write a value in the database?

First, all write requests are redirected to the Leader.

The Leader makes a note of the requests but doesn't commit it to the log.

All requests are forwarded to the Leader

Instead, the Leader replicates the value to the rest of the (followers) nodes.

The leader replicates the value to the followers

Finally, the Leader waits until a majority of nodes have written the entry and commits the value.

The state of the database contains the value.

Once the write succeeds, an acknowledgement is sent back to the client.

The value is written to disk

A new election is held if the cluster leader goes offline for any reason.

In practice, this means that etcd will remain available as long as most nodes are online.

How many nodes should an etcd cluster have to achieve "good enough" availability?

It depends.

RAFT HA table

To help you answer that question, let me ask another question!

Why stop at 3 etcds, why not having a cluster with 9 or 21 or more nodes?

Hint: check out the replication part.

A cluster with 9 nodes

The Leader has to wait for a quorum before the value is written to disk.

The more followers there are in the cluster, the longer it takes to reach a consensus.

In other words, you trade availability for speed.

A cluster with 9 nodes takes more time to write values to disk

If you enjoyed this thread but want to know more on:

  • Change notifications.
  • Creating etcd clusters.
  • Replacing etcd with SQL-like DBS with kine.

Check out this article.

And finally, if you've enjoyed this thread, you might also like:

Latest comments (1)

Collapse
 
serhiyandryeyev profile image
Serhiy

Thank you! Great explanation!