A personal blog by Anas Mazioudi (@disklosr)

Lesser known Docker tips for advanced users

— 15 min read

I've lately been tasked with migrating a classic .Net business web API to .NET Core. One of the goals of this migration was to host the API inside a Linux system using Docker containers. It was quite a long and enjoyable journey. Now that it's all done, I thought I'd share some interesting Docker bits I learned along the way.

Don't run your containers as root

I chose to include this one first because it's the most important of all the tips I'm sharing here. By default, Docker runs containers as root user, i.e. the active user inside the container under which your app starts is the privileged user root.

From the kernel's point of view, processes running inside containers are no different than those running directly inside your other non-container processes (there are of course some differences that are irrelevant for the sake of this argument). You don't have to take my word for it, try this for yourself:

$ docker run --rm -d alpine sleep 10                                     
2fc04b91eebfb2f4c92a14530fa5c75c611bbedf6fe4695b4d13d3d2bfac72ec

$ ps -aux | grep sleep
root 8431 3.0 0.0 1552 4 ? Ss 16:53 0:00 sleep 10
anas 8480 0.0 0.0 6288 2304 pts/1 S+ 16:53 0:00 grep sleep

The command ps -aux is reporting the process we ran inside the container. This illustrates that the kernel sees it as any other non-containerized process. Notice the user under which the process is running inside the container? It's root. And don't trick yourself into thinking it's a special root user suited for running container and has nothing to do with the normal root user. They are effectively the same!

If a container process running as root gets compromised, the attackers could break out of the container layer and automatically have full access to your Docker host without any need for privileged escalation.

I will not go into details on how to mitigate this because it requires a post of its own, but here's one way you can do it.

The takeaway here is that you should treat your container processes as you would treat any other process, limit privileges to the bare minimum and think thrice before running as root.


Make use of capabilities

Speaking of keeping privileges to the bare minimum, Capabilities are a powerful feature that enables you to do just that. This comes in handy for when you have good reasons why you would want to run your container as root, but still want to limit its privileges.

Capabilities are actually a Linux feature that can be configured via Docker when creating containers. They were introduced in version 2.2 of the Linux kernel. Here's a succinct definition straight from man pages:

Linux divides the privileges traditionally associated with superuser into distinct units, known as capabilities, which can be independently enabled and disabled.

To put it differently, capabilities are a fine-grained set of features that describes what a given process is permitted to do. For instance, CAP_CHOWN allows a process to make arbitrary changes to file UIDs and GIDs, while CAP_NET_BIND_SERVICE allows a process to bind a socket to privileged ports (a number less than 1024).

Recent Linux versions have about 40 of these and you can check them using this command:

$ capsh --print

Current: =
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,
cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,
cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,
cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,
cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,
cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,
cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,
cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read

Processes running inside Docker containers have a default set of capabilities switched on. These are well documented here. The reason why you can bind to port 80 inside containers is thanks to CAP_NET_BIND_SERVICE which is granted by default. And the reason why you can't change the hostname of a container (try it for yourself) is because of the missing capability CAP_SYS_ADMIN.

Here's how you can control capabilities for your Docker containers, you can check out the Docker docs for more details:

I'm sure you can see why CAP_NET_BIND_SERVICE might not be necessary for your containers. This goes to show that you should carefully pick your container capabilities, giving only what's necessary and not relying on the default configuration of Docker.

If you're already running your containers as unprivileged users, then you can safely just drop all capabilities.

$ docker run --cap-drop=ALL some-image

This way, if an attacker manages to do a privilege escalation attack inside your container to get a root shell, it would be pointless because they won't be able to do anything worthwhile.

It's surprising how this feature is underutilized seeing how easy it is to use and how greatly it can harden the security of your container.


Know the implication of giving access to Docker socket

This is another security tip here, but I promise it's the last one.

The Docker socket is a file socket used for communicating with Docker daemon. Anyone who has access to it has the power to talk freely to Docker daemon and ask it to do whatever it wants. Access to this socket is required for most Docker tools like Watchtower for instance. It's a great piece of software that can automagically update your containers to the latest versions. It's very simple to use, you just run this one-liner and you're good to go:

$ docker run -d                                  \
-v /var/run/docker.sock:/var/run/docker.sock \
containrrr/watchtower

The catch is that this tool requires access to Docker socket so it can inspect your running containers so it can update them. Since this tool can itself run inside a container, you give it access by mounting the socket as a Docker volume. Many Docker toolings use this technique but very few warn you about its implications: Giving access to the *Docker socket" is equivalent to giving root access.

How is that you might ask? It's quite easy. If you give me access to this socket, I can simply tell the Docker daemon to create a container and mount the host's root folder as a volume inside it, and while we're at it, why not ask for full capabilities because why not?

$ docker run --it --privileged  \
-v /:/hostroot \
ubuntu \
/bin/bash

docker will happily create such a container for you because there's no authentication whatsoever at the daemon level. Hopefully, you'll now think twice before giving access to Docker socket.

Now let's move to the fun stuff.


Don't mount SSH keys inside your containers

Sometimes you'd like to SSH to another host from inside your container but the server denies password authentication and your only option is to use your SSH key. Your first reflex is to mount your key inside your container and call it a day!

While this works, you should be ashamed of yourself because your secret key is now shared with some random container. I'm guilty of this but I don't have to be ashamed of it anymore because there is a better option: SSH agent forwarding.

If you use SSH, chances are that you are already familiar with SSH forwarding. But if you don't know how it works behind the scene, allow me to explain it to you: When you start ssh-agent it creates a Linux socket that can be used by apps that need access to secret keys, then exports the location of the socket in an environment variable (usually SSH_AUTH_SOCK)

Do you know what else you can do with sockets? You can mount them inside containers! Like this:

$ docker run                         \
-v $SSH_AUTH_SOCK/ssh-agent \
--env SSH_AUTH_SOCK=/ssh-agent \
alpine \
ssh-add -l

Simple as that! Now your container will have access to ssh-agent and you can use your keys without mounting them. Note that this won't work on Windows for obvious reasons.


Label your Docker images

Labels are an interesting feature of Docker images that can be used for so many purposes, the main one being describing them. Given a Docker image, one should be able to quickly tell what is it about, know who's the author, what's the product's version and from what revision it was built.

A good practice to follow is to namespace your labels. There exist some standards you can use to name your labels. The first one was Label Schema which was deprecated in favor of OCI Image spec which has a broader scope than just dealing with labels and is backward compatible with the former one.

The annotations section of the OCI Image Spec describes the naming convention for namespacing your labels (reverse DNS notation), along with some predefined annotation keys to use for your images. Should you decide to follow this spec, make sure you respect its rules. Below are some examples of predefined labels of OCI spec should you choose to use it:

LABEL org.opencontainers.image.created='2019-12-01'       
LABEL org.opencontainers.image.authors='author@example.org'
LABEL org.opencontainers.image.url='https://github.com/disklosr/example'
LABEL org.opencontainers.image.documentation='https://github.com/disklosr/example/docs'
LABEL org.opencontainers.image.version='0.0.1'
LABEL org.opencontainers.image.vendor='disklosr'
LABEL org.opencontainers.image.title='Some title'
LABEL org.opencontainers.image.description='Some description here'

For dynamic labels that change every time an image is built, like version, creation date and commit id, you can make use of Docker build time variables to pass in your custom values during the build.

It's also worth noting that adding labels results in creating a new Docker image, which can result in Docker build cache invalidation. You can safely put static labels that don't change often at the start of your Dockerfile. As for dynamic labels, place them strategically such as to not lose build cache optimization.

Finally, I recommend this read for more details about Docker labels.


Speed up your integration testing

I don't know about you but several years ago, when SSDs were still not a thing, I was fascinated by RAM disks and the insane speed you get for IO intensive operation like code compilation, game loading, and video editing to name a few. But these benefits came with a cost: RAM is volatile, so any change you do in your RAM disk needs to be persisted back to a real disk or else it will be lost forever.

Thankfully, all hope is not lost. Volatility can be a desirable property, especially when dealing with integration testing. If you're using a Dockerized database for your integration tests, then I have good news for you: Docker allows you to create volumes of type tempfs which behaves exactly like RAM disks. Any data written to a tempfs volume will never be persisted and will only exist during the container's execution.

Here's how you would run a MySQL database with all data stored in memory:

$ docker run -d --rm        \
--name tempfs-mysql \
--tmpfs /var/lib/mysql \ # We don't need source directory here.
mysql:latest

Not only will this make your integration tests run faster since you're not persisting anything to disks, but it will also automatically take care of cleaning up all temporary test data created by the tests when your database container stops.


Leverage Docker architecture

Docker operates in a classic client/server architecture. The server part is represented by the Docker daemon, which serves a REST API exposed by default via the Linux socket /var/run/docker.sock. The client part is Docker-cli. Its job is to simply translate command line instructions into valid HTTP requests for the daemon to process them.

This might be old news for you, but there are two interesting ways you can leverage this architecture.

The first is that you don't have to use Docker-cli to interact with Docker. You can do everything using curl. This well-known swiss-army tool supports sending HTTP requests to Linux sockets and that's all you need. For instance, to retrieve all containers based on alpine image, you do the following:

$ curl -g --unix-socket /var/run/docker.sock \ 
'http://localhost/containers/json?all=true&filters={"ancestor":["alpine"]}'

[{
"Id":"8d2c76638f06896d737aa635987627fb09d3acdfa7fe0074bc224117ec856604",
"Names":["/amazing_goodall"],
"Image":"alpine",
"ImageID":"sha256:b7b28af77ffec6054d13378df4fdf02725830086c7444d9c278af25312aa39b9",
"Command":"id",
"Created":1565189887,
"Ports":[],
"Labels":{},
"State":"created",
"Status":"Created",
"HostConfig":{"NetworkMode":"default"},
"Mounts":[]
}]

The Docker REST API is well document in the official docs website. It allows more advanced scenarios and filters than it is possible using Docker-cli.

Now, of course, Docker-cli is easier to use for most of Docker operations. But when this default client isn't enough for your need, issuing HTTP requests to Docker API and processing the JSON output using jq can be more powerful to get exactly the data you wish.

The second implication of this client-server architecture is that you can easily administer the same Docker daemon from everywhere you want. This is very useful in Windows if you want to access your Docker daemon from inside WSL! Or a less common use case is when you're self-hosting some Docker apps in your home server and want to be able to administer them from your personal laptop. It's easy to do, here' how:

  1. Enable Docker daemon remote API over TCP (I've already touched on the implication of this in an earlier point so make sure you know what you're dong).
  2. Install the right Docker-cli for your platform (a single binary, you need to unzip build archives to get only Docker-cli binary)
  3. Export the location of the Docker daemon address as an environment variable like this: DOCKER_HOST=tcp://{host}:{port}

Now all your Docker-cli commands will be automatically forwarded to the TCP exposed Docker daemon and everything will transparently work just like when you're interacting with a local Docker installation. And guess what? You can use curl in this case too.


Use Docker built-in HealthCheck

Docker can also help you monitor the health of your containers. You can instruct it to run a specific command at set intervals and report back the health of your container based on its result.

Let's take an example of a web API. A basic monitoring technique is to continuously check if your web app is responding normally to requests and be notified otherwise. Assuming a web app with a ping endpoint, here's how you can do this in a Dockerfile:

HEALTHCHECK --interval=1m \
--timeout=5s \
--retries=3 \
curl -f http://localhost/ping || exit 1

Adding a health check instruction in a Dockerfile augments the number of states your container can be by 3: Starting, Healthy and Unhealthy

This will instruct Docker to run the curl command every minute expecting a response in less than 5 seconds. If it's the case, the command will exit with code 0 which Docker interprets as Healthy. If theAPI doesn't respond within the time limit or if the request returns a bad status code, the exit code will be 1. Docker will then retry 2 more times, if the exit code is still 1, it will report the container as Unhealthy.

If your container takes time when starting, you can make use of the exit code 2 to signal to Docker that your container is still starting.

To better benefit from this feature, make sure your web apps have a dedicated healthcheck endpoint that reports back the internal health of the system. The response should include information about any critical component of your system, like database connections or available disk space.

One reader suggested that it might not be a good idea for an application to report the status of Database or any other external component because it can cause cascading failures in your system. Make sure components only report their own Health (not the other's), and be aware of the dangers of misusing these checks.

Knowing the health of your containers enables you to better respond to system defects. Since containers are and should be designed to be ephemeral, you can, for example, instruct your system to automatically and gracefully stop unhealthy containers before replace them with new healthy ones.


I hope you've learned something new from this list. If you have any remarks, additions or similar tips to share please don't hesitate to contribute right below in the comments section.

Found this post interesting? You can share it on
, , Hackernews, Reddit,
mail it to a friend, or save it to Pocket