Skip to content

Latest commit

 

History

History
454 lines (309 loc) · 19.7 KB

docker-graceful-shutdown.md

File metadata and controls

454 lines (309 loc) · 19.7 KB

Gracefully Shutting Down Applications In Docker Containers

NOTE: The new title makes is clearer what the subject is

We can speak about the graceful shutdown of our application, when all of the resources it used and all of the traffic and/or data processing that it handled are closed and released properly. That means that no database connection remains open and no ongoing request fails because we stop our application. - Péter Márton

NOTE: While correct, the quoted text is a bit confusing (at least to me). I had to read it twice to understand what the author wanted to say. I'm not sure whether that's because of the text itself or because you started with it without any context. Also, I think it's not gramatically correct. I made a few minor changes but it still does not "sound right".

Since I could not have written it better myself, I've quoted Péter Márton.

Cleaning up your mess and informing people of your impending departure is a good thing. Many programming languages and frameworks have hooks for listening to signals allowing you to handle a shutdown, whether it's expected or not. We'll explore them later.

NOTE: Are you giving your personal opinion that is likely different that others or we can consider that a fact (as much as anything is a fact in IT)?

When we have resources open, such as files, database connections, background processes and others, we should clean them up before exiting. This cleanup constitutes graceful shutdown.

NOTE: The first sentence of the previous paragraph is unfinished. It's as if it is part of the second sentence. But, combining them as they are makes a very long sentence.

We're going to dive into that subject, exploring several complimentary topics that together should help improve your (Docker) application's ability to gracefully shutdown.

  • The case for graceful shutdown
  • How to run processes in Docker
  • Process management
  • Signals management

The case for graceful shutdown

We're in an age where many applications are running in Docker containers across a multitude of clusters and (potentially) different orchestrators. Such deployment strategies bring a myriad of concerns that should be tackled. A few of the examples are logging, monitoring, tracing, and many more. One significant way we defend ourselves against the perils of distributed nature of these clusters is to make our applications more resilient.

NOTE: I'd mention that this article is not tackling resiliency or, at least, not in depth. Otherwise, the last sentence from the previous paragraph gives a false hope.

However, there is still no guarantee your application is always up and running. So another concern we should tackle is how it responds when it does fail, including it being told to stop by the orchestrator. Now, this can happen for a variety of reasons, for example; your application's health check fails or your application consumed more resources than allowed.

NOTE: Similar like the previous note... My understanding is that you're not dealing with failures, but in intentional graceful shutdown.

Not only does that increase the reliability of your application, but it also increases the reliability of the cluster it lives in. As you can not always know in advance where your application is run, you might not even be the one putting it in a docker container, make sure your application knows how to quit!

NOTE: Since you already mentioned that an application might not be inside a container, you might also want to mention that the rules of graceful shutdown are not directly related to Docker (they are Linux best-practices).

NOTE: I don't think you made a case for graceful shutdown (as subtitle indicated). You went into other subjects (failover, resiliency, and so on). Why do we want to shut down gracefully? That was hinted in the initial quote but it was lost in the rest of the text (so far).

NOTE: Try to transition from one sub-section to another. Something like "Since containers are a commonly used deployment mechanism, and Docker is (still) the most commonly used container engine, we'll explore how to run processes inside it.

How to run processes in Docker

There are many ways to run a process in Docker. I prefer to make things easy to understand and easy to know what to expect. So this article deals with processes started by commands in a Dockerfile.

NOTE: Make a connection to the main subject. Are you explaining this because people don't know how to run commands in Docker or because it's important for graceful shutdown?

There are several ways to run a command in a Dockerfile.

These are as follows.

  • RUN: runs a command during the docker build phase
  • CMD: runs a command when the container gets started
  • ENTRYPOINT: provides the location from where commands get run when the container starts

NOTE: Why is RUN in this story?

You need at least one ENTRYPOINT or CMD in a Dockerfile for it to be valid. They can be used in collaboration but they can do similar things.

You can put these commands in both a shell form and an exec form. For more information on these commands, you should check out Docker's docs on Entrypoint vs. CMD.

In summary, the shell form runs the command as a shell command and spawn a process via /bin/sh -c.

Whereas the exec form executes a child process that is still attached to PID1.

We'll show you what that looks like, borrowing the Docker docs example referred to earlier.

NOTE: It might be "I will show you" or "we'll explore". "We'll show you" sounds like both you and the reader are showing something to the reader.

NOTE: Reader's thinking... "Still nothing about graceful shutdown. I'll be patient."

Docker Shell form example

Create the following Dockerfile.

NOTE: Why would I do that?

FROM ubuntu:18.04
ENTRYPOINT top -b

Then build and run it.

docker image build --tag shell-form .
docker run --name shell-form --rm shell-form

This should yield the following output?

top - 16:34:56 up 1 day,  5:15,  0 users,  load average: 0.00, 0.00, 0.00
Tasks:   2 total,   1 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.4 us,  0.3 sy,  0.0 ni, 99.2 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  2046932 total,   541984 free,   302668 used,  1202280 buff/cache
KiB Swap:  1048572 total,  1042292 free,     6280 used.  1579380 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    1 root      20   0    4624    760    696 S   0.0  0.0   0:00.05 sh
    6 root      20   0   36480   2928   2580 R   0.0  0.1   0:00.01 top

As you can see, sh and top processes are running.

Meaning, that killing the process, with ctrl+c for example, terminates the sh process, but not top.

To kill this container, open a second terminal and execute the following command.

docker rm -f shell-form

As you can imagine, this is usually not what you want. So as a general rule, you should never use the shell form. So on to the exec form we go!

NOTE: User's thinking.. "The title is definitelly missleading. There's no graceful shutdown. Like almost everyone else, my attention span is very short and I'm giving up."

NOTE: There is no indication to the reader why all this matters to graceful shutdown. Maybe it does, or maybe it doesn't. Is it?

Docker exec form example

The exec form is written as an array of parameters: ENTRYPOINT ["top", "-b"]

To continue in the same line of examples, we will create a Dockerfile, build and run it.

FROM ubuntu:18.04
ENTRYPOINT ["top", "-b"]

Then build and run it:

docker image build --tag exec-form .
docker run --name exec-form --rm exec-form

This should yield the following:

top - 18:12:30 up 1 day,  6:53,  0 users,  load average: 0.00, 0.00, 0.00
Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.4 us,  0.3 sy,  0.0 ni, 99.2 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  2046932 total,   535896 free,   307196 used,  1203840 buff/cache
KiB Swap:  1048572 total,  1042292 free,     6280 used.  1574880 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
    1 root      20   0   36480   2940   2584 R   0.0  0.1   0:00.03 top

Docker exec form with parameters

A caveat with the exec form is that it doesn't interpolate parameters.

You can try the following:

FROM ubuntu:18.04
ENV PARAM="-b"
ENTRYPOINT ["top", "${PARAM}"]

Then build and run it:

docker image build --tag exec-param .
docker run --name exec-form --rm exec-param

This should yield the following:

/bin/sh: 1: [top: not found

This is where Docker created a mix between the two styles. It allows you to create an Entrypoint with a shell command - performing interpolation - but executing it as an exec form. This can be done by prefixing the shell form, with, you guessed it, exec.

FROM ubuntu:18.04
ENV PARAM="-b"
ENTRYPOINT exec "top" "${PARAM}"

Then build and run it:

docker image build --tag exec-param .
docker run --name exec-form --rm exec-param

This will return the exact same as if we would've run ENTRYPOINT ["top", "-b"].

Now you can also override the param, by using the environment variable flag.

docker image build --tag exec-param .
docker run --name exec-form --rm -e PARAM="help" exec-param

Resulting in top's help string.

The special case of Alpine

One of the main best practices for Dockerfiles, is to make them as small as possible. The easiest way to do this is to start with a minimal image. This is where Alpine Linux comes in. We will revisit out shell form example, but replace ubuntu with alpine.

Create the following Dockerfile.

FROM alpine:3.8
ENTRYPOINT top -b

Then build and run it.

docker image build --tag exec-param .
docker run --name exec-form --rm -e PARAM="help" exec-param

It will result in the following output.

Mem: 1509068K used, 537864K free, 640K shrd, 126756K buff, 1012436K cached
CPU:   0% usr   0% sys   0% nic 100% idle   0% io   0% irq   0% sirq
Load average: 0.00 0.00 0.00 2/404 5
  PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
    1     0 root     R     1516   0%   0   0% top -b

Aside from top's output looking a bit different, there is only one command.

Alpine Linux helps us avoid the problem of shell form altogether!

Process management

Now that we know how to create a Dockerfile that helps us make sure we can run as PID1 so that we can make sure our process correctly responds to signals?

We'll get into signal handling next, but first, let us explore how we can manage our process. As you're used to by now, there are multiple solutions at our disposal.

We can broadly categorize them like this:

  • Process manages itself and it's children, by itself
  • We let Docker manage the process, and it's children
  • We use a process manager to do the work for us

Process manages itself

Great, if this is the case, it saves you some trouble of relying on dependencies. Unfortunately, not all processes are designed for PID1, and some might be prone to zombie processes regardless.

In those cases, you still have to invest some time and effort to get a solution in place.

Docker manages PID1

Docker has a build in feature, that it uses a lightweight process manager to help you.

So if you're running your images with Docker itself, either directly or via Compose or Swarm, you're fine. You can use the init flag in your run command or your compose file.

Please, note that the below examples require a certain minimum version of Docker.

Docker Run

docker run --rm -ti --init caladreas/dui

Docker Compose

version: '2.2'
services:
    web:
        image: caladreas/java-docker-signal-demo:no-tini
        init: true

Docker Swarm

version: '3.7'
services:
    web:
        image: caladreas/java-docker-signal-demo:no-tini
        init: true

Relying on Docker does create a dependency on how your container runs. It only runs correctly in Docker-related technologies (run, compose, swarm) and only if the proper versions are available.

Creating either a different experience for users running your application somewhere else or not able to meet the version requirements. So maybe another solution is to bake a process manager into your image and guarantee its behavior.

Depend on a process manager

One of our goals for Docker images is to keep them small. We should look for a lightweight process manager. It does not have too many a whole machine worth or processes, just one and perhaps some children.

Here we would like to introduce you to Tini, a lightweight process manager designed for this purpose. It is a very successful and widely adopted process manager in the Docker world. So successful, that the before mentioned init flags from Docker are implemented by baking Tini into Docker.

Debian example

For brevity, the build process is excluded, and for image size, we use Debian slim instead of default Debian.

FROM debian:stable-slim
ENV TINI_VERSION v0.18.0
ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /tini
RUN chmod +x /tini
ENTRYPOINT ["/tini", "-vv","-g", "--", "/usr/bin/dui/bin/dui","-XX:+UseCGroupMemoryLimitForHeap", "-XX:+UnlockExperimentalVMOptions"]
COPY --from=build /usr/bin/dui-image/ /usr/bin/dui

Alpine example

Alpine Linux works wonders for Docker images, so to improve our lives, you can very easily install it if you want.

FROM alpine
RUN apk add --no-cache tini
ENTRYPOINT ["/sbin/tini", "-vv","-g","-s", "--"]
CMD ["top -b"]

Signals management

Now that we can capture signals and manage our process, we have to see how we can manage those signals. There are three parts to this:

  • Handle signals: we should make sure our process can deal with the signals it receives
  • Receive the right signals: we might have to alter the signals we receive from our orchestrators
  • Signals and Docker orchestrators: we have to help our orchestrators to know when to deliver these signals.

For more details on the subject of Signals and Docker, please read this excellent blog from Grigorii Chudnov.

Handle signals

Handling process signals depend on your application, programming language or framework.

For Java and Go(lang) we dive into this further, exploring some options we have here, including some of the most used frameworks.

Receive the right signals

Sometimes your language or framework of choice, doesn't handle signals all that well. It might be very rigid in what it does with specific signals, removing your ability to do the right thing. Of course, not all languages or frameworks are designed with Docker container or Microservices in mind, are yet to catch up to this more dynamic environment.

Luckily Docker and Kubernetes allow you to specify what signal too sent to your process.

Docker run

docker run --rm -ti --init --stop-signal=SIGINT \
   caladreas/java-docker-signal-demo

Docker compose/swarm

Docker's compose file format allows you to specify a stop signal. This is the signal sent when the container is stopped in a normal fashion. Normal in this case, meaning docker stop or when docker itself determines it should stop the container.

If you forcefully remove the container, for example with docker rm -f  it will directly kill the process, so don't do that.

version: '2.2'
services:
    web:
        image: caladreas/java-docker-signal-demo
        stop_signal: SIGINT
        stop_grace_period: 15s

If you run this with docker-compose up and then in a second terminal, stop the container, you will see something like this.

web_1  | HelloWorld!
web_1  | Shutdown hook called!
web_1  | We're told to stop early...
web_1  | java.lang.InterruptedException: sleep interrupted
web_1  | 	at java.base/java.lang.Thread.sleep(Native Method)
web_1  | 	at joostvdg.demo.signal@1.0/com.github.joostvdg.demo.signal.HelloWorld.printHelloWorld(Unknown Source)
web_1  | 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
web_1  | 	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
web_1  | 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
web_1  | 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
web_1  | 	at java.base/java.lang.Thread.run(Unknown Source)
web_1  | [DEBUG tini (1)] Passing signal: 'Interrupt'
web_1  | [DEBUG tini (1)] Received SIGCHLD
web_1  | [DEBUG tini (1)] Reaped child with pid: '7'
web_1  | [INFO  tini (1)] Main child exited with signal (with signal 'Interrupt')

Kubernetes

In Kubernetes we can make use of Container Lifecycle Hooks to manage how our container should be stopped. We could, for example, send a SIGINT (interrupt) to tell our application to stop.

apiVersion: apps/v1
kind: Deployment
metadata:
    name: java-signal-demo
    namespace: default
    labels:
        app: java-signal-demo
spec:
    replicas: 1
    template:
        metadata:
            labels:
                app: java-signal-demo
        spec:
            containers:
            - name: main
              image: caladreas/java-docker-signal-demo
              lifecycle:
                  preStop:
                      exec:
                          command: ["killall", "java" , "-INT"]
            terminationGracePeriodSeconds: 60

When you create this as deployment.yml, create and delete it - kubectl apply -f deployment.yml / kubectl delete -f deployment.yml - you will see the same behavior.

 

Signals and Docker orchestrators

Now that we can respond to signals and receive the correct signals, there's one last thing to take care off. We have to make sure our orchestrator of choice sends these signals for the right reasons. Quickly telling us, there's something wrong with our running process, and it should shut down, which of course, we'll do gracefully!

As the topic for health, readiness and liveness checks is a topic on its own, we'll keep it short. Giving some basic examples and pointing you to more work to further investigate how to use it to your advantage.

Docker

You can either configure your health check in your Dockerfile or configure it in your docker-compose.yml for either compose or swarm.

Considering only Docker can use the health check in your Dockerfile, it is strongly recommended to have health checks in your application and document how they can be used.

Kubernetes

In Kubernetes we have the concept of Container Probes. This allows you to configure whether your container is ready (readinessProbe) to be used and if it is still working as expected (livenessProbe).