Containers are these fancy new thingies (technical term), that are made to make our lives as developers and operators easier. Or aren't they? This article shows what a container really is and why cgroups and the Linux Kernel are an elemental part of it.
Docker, Podman and more
Docker, Podman and Kubernetes have made packaging and shipping software really easy. In addition, container technologies are adding some security and management layers to our deployments. The whole technology behind containers is so convenient that even Flatpak uses it to provide sandboxing and permission control for the packaged software.
But why is this the case? What is provided by Linux that makes containers so convenient, and what is happening behind the scenes?
Under the hood, most container software uses two major technologies: namespaces and cgroups. For this article, I want to demonstrate how these work and how you can make it work for you, too.
If you start a container, you are basically creating a new namespace, which holds some data and executes a binary. Sounds weird? Let's see this in an actual example.
For demonstration purposes, we can start with a simple example.
# List all processes and count them $ ps auxww | wc -l 335
In my case, 335 processes are running on my workstation. This is pretty typical for a desktop with a couple of applications opened.
Next, let's create a new namespace, where we want to execute bash. This can be done with the
# Create a new PID namespace and run bash in it $ sudo unshare --fork --pid --mount-proc /usr/bin/bash
Let's also count the processes here:
# List all processes $ ps auxww USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.7 0.0 226896 8296 pts/0 S 23:41 0:00 /usr/bin/bash root 37 0.0 0.0 225880 4052 pts/0 R+ 23:41 0:00 ps auxww
As you can see, even without the
wc -l command, we can easily count the processes. This is the power of namespaces.
Above, we started a PID namespace. But, the Linux kernel offers way more namespaces. Let's have a look at these, too.
The process (PID) namespace creates a new process branch. All processes started in this namespace cannot access processes higher than where the branch was created.
Processes running in a separated mount (MNT) namespace cannot access files outside of it. This is somewhat similar to the
chrootcommand, but works on the kernel level.
In a network (NET) namespace, you can limit the access to network devices and network features. You still need to create these network devices outside the namespace.
The user namespace branches virtual UIDs and GIDs. This allows to have root privileges inside the namespace, but not outside. Even a regular user can create a user namespace, where the user inside the namespace is privileged.
The UTS namespace controls hostname and domain information, and allows processes to think they’re running on differently named machines.
- Inter Process Communication
The IPC namespace controls which processes can talk to each other.
- Control Group
The cgroup namespace is somewhat special. There is a dedicated set of tools available to control resources like CPU, memory, disk space, network traffic, etc.
As you might guess already, the combination of the above allows you to create a thing, where you only see some processes, can act as a different user, have access to another filesystem, etc.
This is exactly what containers are about. Docker or Podman are (basically) a set of tools, which combine these namespaces to create slices of your system.
Working with Namespaces
After learning about namespaces, we should give it a shot and play a bit with them. Most of the time, you don't need to do this, but for me this is quite interesting stuff.
First, I want to list all namespaces. Let's see what my workstation does by running
$ lsns NS TYPE NPROCS PID USER COMMAND 4026531834 time 141 1884 dschier /usr/lib/systemd/systemd --user 4026531835 cgroup 141 1884 dschier /usr/lib/systemd/systemd --user 4026531836 pid 117 1884 dschier /usr/lib/systemd/systemd --user 4026531837 user 114 1884 dschier /usr/lib/systemd/systemd --user 4026531838 uts 141 1884 dschier /usr/lib/systemd/systemd --user 4026531839 ipc 141 1884 dschier /usr/lib/systemd/systemd --user 4026531840 net 141 1884 dschier /usr/lib/systemd/systemd --user 4026531841 mnt 114 1884 dschier /usr/lib/systemd/systemd --user
Seems like systemd has created some namespaces for me, already. What, if I create another one? This can be done with the
unshare command. To check from inside and outside the namespace, it is a good idea to have two twerminal sessions open.
# Create new namespace (terminal 1) $ unshare --fork --pid --user --mount-proc /usr/bin/bash # Check existing namespaces (terminal 1) $ lsns NS TYPE NPROCS PID USER COMMAND 4026531834 time 2 1 nobody /usr/bin/bash 4026531835 cgroup 2 1 nobody /usr/bin/bash 4026531838 uts 2 1 nobody /usr/bin/bash 4026531839 ipc 2 1 nobody /usr/bin/bash 4026531840 net 2 1 nobody /usr/bin/bash 4026532752 user 2 1 nobody /usr/bin/bash 4026532756 mnt 2 1 nobody /usr/bin/bash 4026532757 pid 2 1 nobody /usr/bin/bash # Check existing namespaces (terminal 2) 4026532752 user 2 69002 dschier unshare --fork --pid --user --mount-proc /usr/bin/bash 4026532756 mnt 2 69002 dschier unshare --fork --pid --user --mount-proc /usr/bin/bash 4026532757 pid 1 69003 dschier └─/usr/bin/bash
This way, you can create all kind of namespaces. But, you can also enter namespaces, which are already created.
You just need to know the PID of the processes namespace, you want to enter. In our case above, this is 69002.
# Enter namespace $ nsenter -t 69002 --user --preserve-credentials
This allows you to debug a process in this namespace, but also see if something can be used from within it. For now, this should be sufficient to introduce namespaces.
Docs & Links
A technology like namespaces comes with a vast documentation and lots of articles across the web. The below links may be interesting for you.
As you might guess, using namespaces can be interesting and powerful. In fact, it is so powerful, that Docker, Podman, Kubernetes and other container based technologies like Flatpak are making use of it.
Were you aware of namespaces? How have you used them? Please let me know if you want to learn more about this topic.