Linux namespaces and rise of the containers

Process isolation is not new to the unix. It has existed in different forms and flavors in unix systems. Following is the quick history:

  • 1979 - chroot system call added which provided process isolation. Chroot was added to BSD in 1982.
  • 2000 - FreeBSD introduced jails - an early container technology.
  • 2001 - Linux vServer - introduced Operating System virtualization.
  • 2004 - Oracle Solaris Zones and OpenVZ also started providing features
  • 2006 - Process containers or Control Groups - aggregates processes within linux kernel for resource management.
  • 2008 - LXC (Linux Containers) - provided tooling to take advantage of the cgroups and namespace functionality in the linux kernel.
  • 2013 - Docker - Process isolation using kernel cgroups and namespaces, combined with tools to build and retrieve named images. Docker initially used LXC but later switched to libcontainer/ runc.

Also, Process isolation in operating systems has been the key to security for protecting one process from other processes. In servers with multiple applications, it becomes very important that services are isolated from each other for security and stability reasons. If one of you service gets hacked, hacker might be able to exploit that service to hack other services and may be even complete server.

So how can we avoid that? Services such as AWS, [Heroku], [Google App Engine] allows running programs(code) uploaded by it’s users in their servers and linux namespacing and process isolation helps them to stay safe against malicious programs or attacks.

Next we are going to discuss different types of linux namespaces and how it has given rise to container based architectures.

Understanding how linux namespaces provides process isolation ?

With linux namespaces features, it’s possible to independently modify process tree, networking interfaces, mount points, inter-process communication resources etc. for child processes.

Following namespaces helps kernel achieve isolation

  • PID (isolates processes Ids)
  • Network (isolates Network devices, stacks, ports, etc)
  • Mount (isolates Mount points)
  • Ipc (isolates System V IPC, POSIX message queues)
  • UTS (isolates Hostname and NIS domain name)
  • User (isolates User and group IDs)

Below screenshow shows how we can create child namespace having different resources and configuration than global or root namespace.

containers_namespace_view

PID Namespace

As we know, historically, linux kernel has supported a single process tree. Whenever linux system boots up, it starts root process with Process Id (PID) 1 which in turn starts other child processes. So a process given it has sufficient priveleges and satisfied certain conditions can inspect another processes and even kill it.

With PID namespace isolation, processes in the child namespace do not have any way of knowing about it’s parent namespace. However, processes in the parent namespace can see processes in the child namespace. i.e. If a process is running within a process namespace, it can only see and communicate with other processes in the same namespace.

With the introduction of PID namespaces, a single process can now have multiple PIDs associated with it, one for each namespace it falls under. For example in screenshot below, PID 6 has also associated PID 1 of another namespace - Container1.

pid-namespace

NET namespace

A network namespace allows each of the processes to see an entirely different set of networking interfaces. Any new Linux process runs in a particular network namespace. By default this is inherited from its parent process, but a process with the sufficient priveleges can switch itself into a different namespace.

Whenever a new network namespace is created, it has logically it’s own stack having:

  • routing table,
  • set of iptables aka firewall rules (for both IPv4 and IPv6)
  • network devices

net-comparison-namespace

Interesting thing to set up is how data will flow from outside network to child net namespaces and vice versa. That’s where it’s necessary to set up “virtual network interfaces” which span multiple namespaces. It is then possible to create Ethernet bridges, and even route packets between the namespaces.

We also need to set up a “routing process” in global network namespace to receive traffic from the physical interface, and route it through the appropriate virtual interfaces to to the correct child network namespaces(container1 and container2 are child net namespaces in below screenshot). Tools like docker do this all heavy lifting for us.

net-namespace

MNT namespace

Creating a separate mount namespace allows each of isolated processes to have their own view of the system’s mountpoint structure and create process specific mountpoints.

Linux maintains a data structure for all the mountpoints of the system. It includes information like what disk partitions are mounted, where they are mounted, their access control info etc. With Linux namespaces feature, the child process can mount or unmount whatever endpoints it needs to, and the change will affect neither its parent’s namespace, nor any other mount namespace in the entire system.

net-namespace

UTS namespace

UTS(UNIX Timesharing System) namespace allows unique Hostname and NIS domain names per namespace.

In the context of containers, it allows each container to have its own hostname and NIS domain name. This can be useful for initialization and configuration scripts that has actions based on these names. UTS namespace also allows containers to have their own FQDN.

uts-namespace

IPC namespace

IPC namespce isolates the interprocess communication resources such as shared memory,semaphore, message queue.

USER namespace

User namespace isolates User and group IDs. User namespace provides following features: * Map UID/GUID from outside the container to UID/GUID inside the container. * Permit non root users to launch containers

Virtual machines and Rise of the containers

Cloud infrastructure providers like AWS, Rackspace allows ability to create Virtual machines in minutes. End users gets ability to scale up or down, guaranteed computational resources, security isolation and API access for provisioning it all, without any of the overhead of managing physical servers in their data centers.

So, how Virtual machines(Vms) ensure process isolation and security? They have to be run as a separate operating system to get a resource and security isolation. They leverage hardware/CPU-based facilities for isolating their access to memory and appear as a handful of hypervisor processes on the host system. They obtain access to resources from the host over virtualized devices—like network cards—and network protocols

However, creating virtual machines comes with it’s own cost. The boot time of vm’s is generally in minutes as they have have to boot entire gues OS and is not good for every use case. They require a full OS and system image, such as EC2’s AMIs. The hypervisor runs a boot process for the VM, often even emulating BIOS.

See below screenshot where hypervisor coordinates with host operating system to run different VM’s having different guest operating systems.

virtual machines

Although, there are many use case virtual machines are good for, for running applications in the cloud, it’s a bit heavy weight. Containers have similar resource isolation and allocation benefits as virtual machines but due to it’s architecture it’s much more portable and efficient.

That’s where companies like Docker has started using namespaces and other features to provide lightweight virtualization. Container boot up time is in seconds and is much faster than VM’s.

What are containers?

Containers can contain software applications in it’s own file system that contains everything it needs to run: code, runtime, system tools, system libraries – anything you can install on a server. This guarantees that it will always run the same, regardless of the environment it is running in.

containers

Container’s share the kernel with other containers. They’re also not tied to any specific infrastructure – Docker containers run on any computer, on any infrastructure and in any cloud.

Docker use cases

Some of the areas where we can use docker are as follows:

  • Continous Integration: Enables developers to develop and deploy and test applications more quickly and within any environment. With integrations with tools like Jenkins, Docker allows developers to collaborate with each other to build code and test its readiness for shipping.
  • Infrastructure optimization: Docker containers contain only what’s necessary to build, ship and run applications. Unlike virtualization technology (VMs), there is no guest OS or hypervisor necessary for containers. This allows enterprises to reduce the amount of storage and eliminate hypervisor licensing costs within their organizations.

It also increases developer efficiency where you can create local development environment using docker so that same stack is running in your all environments and it saves lot of time in comparison to manual application set up in each environment.

Docker features

Although Docker is going thorugh crazy development and tools around it are maturing, some of the features docker provides are as follows:

  • Portbale deployment across machines: Docker images can run in any platform which supports docker.
  • Create containers from build files : We can specify application dependencies in DOCKERFILE as commands and Docker can read this file and boot up your application dependencies. for example, suppose your application is a java web application which need Java 8, Tomcat 8. You can specify the dependencies in Dockerfile and it will install java8, tomcat8 etc. in your container and can deploy your application there.
  • Tools : Docker provides set of tools such as CLI, Rest API’s to interact with it. There are lot of vendors who are building tools around it.

Given the light weight nature of containers, I am expecting that many new use cases will develop and it will revolutionize the industry the same way virtual machines and virtualization did a while ago.

Reference


Version History


Date Description
2016-05-08    Initial Version