Axel Navarro

Posted on May 25, 2022 • Originally published at dev.to

Analyzing the docker layers with dive

#docker #cicd #devops #kubernetes

Dive is a tool to explore a docker image, layer contents and discover ways to shrink the size of your Docker image written in Go.

How do the layers work?

We should start with the concept of layer in Docker - we can say layers are like git commits:

they have a parent layer,
are readonly,
and receive an ID calculated via SHA256 hash.

Why is «readonly» important?

Because if you edit or remove a file, that file still exists inside your docker image.

So, we must clean temporal or useless files in the same RUN sentence to keep our images small.

RUN apt-get update && \
  apt-get install -y apt-transport-https ca-certificates && \
  rm -rf /var/lib/apt/lists/*

The same rule applies when we build an app in a RUN sentence. We must clean the intermediate files and only keep the useful ones.

What is a docker tag?

The Docker tags, like focal in ubuntu:focal, are just pointers to a layer. Unlike git tags, in Docker, we accept that a tag can point to a different layer when the image is updated.

💡 TIP: the image is not updated because the layers are readonly, it's a new image. 🤯

What Dive does?

With Dive we can inspect the layers of a Docker image and see the difference from the parent layer.

dive node:alpine

In the left panel, we can see the layers of the given image, the command which generates the selected layer in purple 💜, the ID and the digest of that selected layer.

In the right panel, we can see the diff of the layer with a color reference:

🟢 green: the new files.
🟡 yellow: the edited files.
🟥 red: the deleted files.

Did you see that /tmp folder in the layer? 🧐 now, I wonder if the /tmp folder with a v8-compile-cache could be not committed to the layer, reducing the size of the node:alpine image by 2.3MiB. 🤔

There is a PR to remove the v8-compile-cache folder, but the cache «is used to speed up a little Yarn loading time from from 153ms to 113ms». Worth it?

What is the efficiency score?

Dive tries to help us by indicating the efficiency of our images. The edited or removed files reduce the efficiency score because the original files exist in the image but are not useful for the container.

The Count column indicates how many times the same file is committed into the image.

The efficiency as linter

You can run this in your CI pipeline to ensure you're keeping wasted space to a minimum:

CI=true dive node:alpine

How is the efficiency score calculated?

The score only takes note of the edited or removed files that consume space in the image.

Maybe we can apply more validation rules that Dive doesn't have today.

Check for content in the /tmp folder.
Check for apt, yum or pacman cache files.
Check for a filename pattern, e.g: do not forget *.tar.gz files inside the image.

More Dive

Thanks to Alex Goodman for this awesome tool! and if you 💛 it, leave a ⭐ in https://github.com/wagoodman/dive.

The Ops Community ⚙️