The Ops Community ⚙️

Arseny Zinchenko
Arseny Zinchenko

Posted on • Originally published at rtfm.co.ua on

Kubernetes: containers, and the “lost” SIGTERM signals

We have an API service with Gunicorn in Kubernetes that periodically returns 502, 503, 504 errors.

I started debugging it, and found a weird thing: there were no messages in the logs about the received SIGTERM, so I first went to deal with Kubernetes - why doesn't it send it?

The Issue

So, here’s what it looks like.

We have a Kubernetes Pod:

$ kk get pod
NAME READY STATUS RESTARTS AGE
fastapi-app-89d8c77bc-8qwl7 1/1 Running 0 38m
Enter fullscreen mode Exit fullscreen mode

Read its logs:

$ ktail fastapi-app-59554cddc5-lgj42
==> Attached to container [fastapi-app-59554cddc5-lgj42:fastapi-app]
Enter fullscreen mode Exit fullscreen mode

Kill it:

$ kk delete pod -l app=fastapi-app
pod "fastapi-app-6cb6b46c4b-pffs2" deleted
Enter fullscreen mode Exit fullscreen mode

And what do we see in his logs? Nothing!

...
fastapi-app-6cb6b46c4b-9wqpf:fastapi-app [2024-06-22 11:13:27 +0000] [9] [INFO] Application startup complete.
==> Container left (terminated) [fastapi-app-6cb6b46c4b-pffs2:fastapi-app]
==> New container [fastapi-app-6cb6b46c4b-9qtvb:fastapi-app]
==> New container [fastapi-app-6cb6b46c4b-9qtvb:fastapi-app]
fastapi-app-6cb6b46c4b-9qtvb:fastapi-app [2024-06-22 11:14:15 +0000] [8] [INFO] Starting gunicorn 22.0.0
...
fastapi-app-6cb6b46c4b-9qtvb:fastapi-app [2024-06-22 11:14:16 +0000] [9] [INFO] Application startup complete.
Enter fullscreen mode Exit fullscreen mode

Here:

  1. the service has started  —  “Application startup complete
  2. the pod die  —  “Container left
  3. a new Pod is started  —  “New container” and “Starting gunicorn

And here is how it should look like in a normal case:

...
fastapi-app-59554cddc5-v7xq9:fastapi-app [2024-06-22 11:09:54 +0000] [8] [INFO] Waiting for application shutdown.
fastapi-app-59554cddc5-v7xq9:fastapi-app [2024-06-22 11:09:54 +0000] [8] [INFO] Application shutdown complete.
fastapi-app-59554cddc5-v7xq9:fastapi-app [2024-06-22 11:09:54 +0000] [8] [INFO] Finished server process [8]
fastapi-app-59554cddc5-v7xq9:fastapi-app [2024-06-22 11:09:54 +0000] [1] [ERROR] Worker (pid:8) was sent SIGTERM!
fastapi-app-59554cddc5-v7xq9:fastapi-app [2024-06-22 11:09:54 +0000] [1] [INFO] Shutting down: Master
==> Container left (terminated) [fastapi-app-59554cddc5-v7xq9:fastapi-app]
Enter fullscreen mode Exit fullscreen mode

That is, here Gunicorn receives a SIGTERM, and correctly finishes its work.

What the hell?

Let’s look into it.

Kubernetes and the Pod termination process

How does the process of stopping a Pod works?

I wrote more about it in Kubernetes: NGINX/PHP-FPM graceful shutdown — getting rid of 502 errors, here is a very short summary:

  1. we run kubectl delete pod
  2. the kubelet on the corresponding WorkerNode receives a command from the API server to stop the Pod
  3. the kubelet sends the SIGTERM signal to the process with PID 1 in the container of the Pod, that is, the first process that was started when the container was created
  4. if the container did not stop within the terminationGracePeriodSeconds - then the SIGKILL is sent

That is, our Gunicorn process should receive the SIGTERM, write it to the log, and start stopping its worker.

Instead, it receives nothing and just dies.

Why?

PID 1 and SIGTERM in containers

Let’s check what do we have in processes of the container of this Pod:

root@fastapi-app-6cb6b46c4b-9qtvb:/app# ps aux

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND

root 1 0.0 0.0 2576 948 ? Ss 11:14 0:00 /bin/sh -c gunicorn -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:80 app:app

root 8 0.0 1.3 31360 27192 ? S 11:14 0:00 /usr/local/bin/python /usr/local/bin/gunicorn -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:80 app:app

root 9 0.2 2.4 287668 49208 ? Sl 11:14 0:04 /usr/local/bin/python /usr/local/bin/gunicorn -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:80 app:app
Enter fullscreen mode Exit fullscreen mode

And we see that here is PID 1 is the /bin/sh process, which runs Gunicorn through -c.

Now let’s run strace in the Pod to see what signals it receives:

root@fastapi-app-6cb6b46c4b-9pd7r:/app# strace -p 1
strace: Process 1 attached
wait4(-1,
Enter fullscreen mode Exit fullscreen mode

Run kubectl delete pod - but add time to measure the time it takes to execute the command:

$ time kk delete pod fastapi-app-6cb6b46c4b-9pd7r
pod "fastapi-app-6cb6b46c4b-9pd7r" deleted

real 0m32.222s
Enter fullscreen mode Exit fullscreen mode

32 seconds…

What’s in the strace?

root@fastapi-app-6cb6b46c4b-9pd7r:/app# strace -p 1
strace: Process 1 attached
wait4(-1, 0x7ffe7a390a3c, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=0, si_uid=0} ---
wait4(-1, <unfinished ...>) = ?
command terminated with exit code 137
Enter fullscreen mode Exit fullscreen mode

So, what was going on here?

  1. The kubelet sent a SIGTERM signal to the process with PID 1 - SIGTERM {si_signo=SIGTERM} - and the PID 1 would have to pass this signal to its children, stop them, and then terminate itself
  2. but the process did not stopped  —  and kubelet waited the default 30 seconds to complete the process' work correctly - see Pod phase
  3. then kubelet killed the container, and the process ended with "terminated with exit code 137"

Usually, the 137 exit code is about OutOfMemory Killer, when a process is killed with SIGKILL, but in our case there was no OOMKill - just a SIGKILL sent because the processes in the Pod did not terminated on time.

Well, where did our SIGTERM go then?

Let’s run the signals directly from the container  —  first kill -s 15, the SIGTERM, then kill -s 9, the SIGKILL:

root@fastapi-app-6cb6b46c4b-r9fnq:/app# kill -s 15 1
root@fastapi-app-6cb6b46c4b-r9fnq:/app# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 2576 920 ? Ss 12:02 0:00 /bin/sh -c gunicorn -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:80 app:app
root 7 0.0 1.4 31852 27644 ? S 12:02 0:00 /usr/local/bin/python /usr/local/bin/gunicorn -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:80 app:app
...
root@fastapi-app-6cb6b46c4b-r9fnq:/app# kill -s 9 1

root@fastapi-app-6cb6b46c4b-r9fnq:/app# ps aux

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND

root 1 0.0 0.0 2576 920 ? Ss 12:02 0:00 /bin/sh -c gunicorn -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:80 app:app

root 7 0.0 1.4 31852 27644 ? S 12:02 0:00 /usr/local/bin/python /usr/local/bin/gunicorn -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:80 app:app
Enter fullscreen mode Exit fullscreen mode

What? How? Why?

Why is SIGTERM ignored? And even more so SIGKILL, which should be a "non-ignored signal" - see man signal:

The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored.

Linux kill(), and The PID 1

Because the PID 1 in Linux is a special process, because it is the first process to be launched by the system, and it must be protected from an “accidental kill”.

If we look at man kill, it is explicitly stated there, and it also talks about signal handlers to the process:

The only signals that can be sent to process ID 1, the init process, are those for which init has explicitly installed signal handlers . This is done to ensure the system is not brought down accidentally

You can check what handlers our process has — that is, what signals our process can intercept and process — from the file /proc/1/status:

root@fastapi-app-6cb6b46c4b-r9fnq:/app# cat /proc/1/status | grep SigCgt
SigCgt: 0000000000010002
Enter fullscreen mode Exit fullscreen mode

The SigCgt signals are the signals that the process can intercept and process itself. The rest will be either ignored or processed with the SIG_DFL handler, and the SIG_DFL handler ignores signals for PID 1, which does not have its own handler.

Let’s ask ChatGPT what exactly these signals are:

(you can actually translate it yourself if you are interested — see, for example How can I check what signals a process is listening to?, or How to read bitmask for the signals)

So here’s what do we have:

  • a process /bin/sh with PID 1
  • the PID 1 is a special process
  • checking PID 1 shows us that it “recognizes” the only SIGHUP and SIGCHLD signals
  • and both SIGTERM and SIGKILL will be ignored by it

But then how does the container stop?

Docker stop and Linux signals

The process of stopping a container in Docker (or Containerd) is no different than in Kubernetes, because in fact kubelet just passes commands to the container runtime. In AWS Kubernetes, this is now containerd.

But for the sake of simplicity, let’s do it locally with Docker.

So, we start the container from the same Docker image that we tested in Kubernetes:

$ docker run --name test-app 492***148.dkr.ecr.us-east-1.amazonaws.com/fastapi-app-test:entry-2     
[2024-06-22 14:15:03 +0000] [7] [INFO] Starting gunicorn 22.0.0
[2024-06-22 14:15:03 +0000] [7] [INFO] Listening at: http://0.0.0.0:80 (7)
[2024-06-22 14:15:03 +0000] [7] [INFO] Using worker: uvicorn.workers.UvicornWorker
[2024-06-22 14:15:03 +0000] [8] [INFO] Booting worker with pid: 8
[2024-06-22 14:15:03 +0000] [8] [INFO] Started server process [8]
[2024-06-22 14:15:03 +0000] [8] [INFO] Waiting for application startup.
[2024-06-22 14:15:03 +0000] [8] [INFO] Application startup complete.
Enter fullscreen mode Exit fullscreen mode

Try to stop it by sending SIGKILL to the PID 1 - nothing has changed, it ignores the signal:

$ docker exec -ti test-app sh -c "kill -9 1"
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
99bae6d55be2 492***148.dkr.ecr.us-east-1.amazonaws.com/fastapi-app-test:entry-2 "/bin/sh -c 'gunicor…" About a minute ago Up About a minute test-app
Enter fullscreen mode Exit fullscreen mode

Try to stop it with docker stop and look at the time again:

$ time docker stop test-app
test-app

real 0m10.234s
Enter fullscreen mode Exit fullscreen mode

And the status of the container:

$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
cab29916f6ba 492***148.dkr.ecr.us-east-1.amazonaws.com/fastapi-app-test:entry-2 "/bin/sh -c 'gunicor…" About a minute ago Exited (137) 52 seconds ago
Enter fullscreen mode Exit fullscreen mode

The same code 137, that is, the container stopped with SIGKILL, and the time taken to stop the container was 10 seconds.

But what if the signal is sent to PID 1, which ignores it?

I didn’t find it in the documentation for the docker kill, but we can kill container processes in two ways:

  1. kill all child processes in the container itself — and then the parent (PID 1) will die itself
  2. kill the entire group of processes on the host through their SID (Session ID) — which again will lead to PID 1 ignoring the signal, but dying itself, because all its children have died

Let’s take another look at the processes in the container:

root@cddcaa561e1d:/app# ps aux

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND

root 1 0.0 0.0 2576 1408 ? Ss 15:58 0:00 /bin/sh -c gunicorn -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:80 app:app

root 7 0.1 0.0 31356 26388 ? S 15:58 0:00 /usr/local/bin/python /usr/local/bin/gunicorn -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:80 app:app

root 8 0.5 0.1 59628 47452 ? S 15:58 0:00 /usr/local/bin/python /usr/local/bin/gunicorn -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:80 app:app

root@cddcaa561e1d:/app# pstree -a

sh -c gunicorn -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:80 app:app
  └─gunicorn /usr/local/bin/gunicorn -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:80 app:app
      └─gunicorn /usr/local/bin/gunicorn -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:80 app:app
Enter fullscreen mode Exit fullscreen mode

We can’t kill the PID 1 because it ignores us — but we can kill the PID 7!

Then it will kill PID 8 after itself, because it is its child process, and when the PID 1 finds out that there are no more of its children, it will die itself, and the container will stop:

root@cddcaa561e1d:/app# kill 7
Enter fullscreen mode Exit fullscreen mode

And container logs:

...
[2024-06-22 16:02:54 +0000] [7] [INFO] Handling signal: term
[2024-06-22 16:02:54 +0000] [8] [INFO] Shutting down
[2024-06-22 16:02:54 +0000] [8] [INFO] Error while closing socket [Errno 9] Bad file descriptor
[2024-06-22 16:02:54 +0000] [8] [INFO] Waiting for application shutdown.
[2024-06-22 16:02:54 +0000] [8] [INFO] Application shutdown complete.
[2024-06-22 16:02:54 +0000] [8] [INFO] Finished server process [8]
[2024-06-22 16:02:54 +0000] [7] [ERROR] Worker (pid:8) was sent SIGTERM!
[2024-06-22 16:02:54 +0000] [7] [INFO] Shutting down: Master
Enter fullscreen mode Exit fullscreen mode

But since Pods/containers die with the exit code 137, they were killed with SIGKILL, because when Docker or other container runtime cannot stop a process with PID 1 with SIGKILL, it sends SIGKILL to all processes in the container.

That is:

  1. first, SIGTEM is sent to the PID 1
  2. after 10 seconds SIGKILL is sent to the PID 1
  3. if this did not help, then SIGKILL is sent to all processes in the container

For example, you can do this by passing the Session ID (SID) to the kill command.

Find the main process of the container:

$ docker inspect --format '{{ .State.Pid }}' test-app
629353
Enter fullscreen mode Exit fullscreen mode

Run ps j -A:

$ ps j -A      
   PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
 ...
 629333 629353 629353 629353 ? -1 Ss 0 0:00 /bin/sh -c gunicorn -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:80 app:app
 629353 629374 629353 629353 ? -1 S 0 0:00 /usr/local/bin/python /usr/local/bin/gunicorn -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:80 app:app
 629374 629375 629353 629353 ? -1 S 0 0:00 /usr/local/bin/python /usr/local/bin/gunicorn -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:80 app:app
Enter fullscreen mode Exit fullscreen mode

We see our SID — 629353.

And kill the whole group:

$ sudo kill -9 -- -629353
Enter fullscreen mode Exit fullscreen mode

Okay.

This all is very good and very interesting.

But is it possible to do without these crutches?

The “Right way” to launch processes in a container

Finally, let’s take a look at our Dockerfile:

FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
ENTRYPOINT gunicorn -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:80 app:app
Enter fullscreen mode Exit fullscreen mode

See the Docker — Shell and exec form documentation:

INSTRUCTION “executable”,”param1",”param2"

INSTRUCTION command param1 param2 (shell form)

So, in our case, the shell form was used — and as a result, we have the /bin/sh as the PID 1, which calls Gunicorn through -c.

If we rewrite it in the exec form:

FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
ENTRYPOINT ["gunicorn", "-w", "1", "-k", "uvicorn.workers.UvicornWorker", "--bind", "0.0.0.0:80", "app:app"]
Enter fullscreen mode Exit fullscreen mode

And will run a container from this image, we will have only Gunicorn processes:

root@e6087d52350d:/app# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.6 0.0 31852 27104 ? Ss 16:13 0:00 /usr/local/bin/python /usr/local/bin/gunicorn -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:80 app:app
root 7 2.4 0.1 59636 47556 ? S 16:13 0:00 /usr/local/bin/python /usr/local/bin/gunicorn -w 1 -k uvicorn.workers.UvicornWorker --bind 0.0.0.0:80 app:app
Enter fullscreen mode Exit fullscreen mode

Which already can handle SIGTERM signals:

root@e6087d52350d:/app# cat /proc/1/status | grep SigCgt
SigCgt: 0000000008314a07
Enter fullscreen mode Exit fullscreen mode

And now, if we send the SIGTERM to the PID 1, the container will finish its work normally:

root@e6087d52350d:/app# kill 1
Enter fullscreen mode Exit fullscreen mode

And logs:

...
[2024-06-22 16:17:20 +0000] [1] [INFO] Handling signal: term
[2024-06-22 16:17:20 +0000] [7] [INFO] Shutting down
[2024-06-22 16:17:20 +0000] [7] [INFO] Error while closing socket [Errno 9] Bad file descriptor
[2024-06-22 16:17:20 +0000] [7] [INFO] Waiting for application shutdown.
[2024-06-22 16:17:20 +0000] [7] [INFO] Application shutdown complete.
[2024-06-22 16:17:20 +0000] [7] [INFO] Finished server process [7]
[2024-06-22 16:17:20 +0000] [1] [ERROR] Worker (pid:7) was sent SIGTERM!
[2024-06-22 16:17:21 +0000] [1] [INFO] Shutting down: Master
Enter fullscreen mode Exit fullscreen mode

And now Kubernetes Pods will stop normally — and quickly, because they won’t wait for a grace period.

Useful links

Originally published at RTFM: Linux, DevOps, and system administration.


Top comments (0)