To start with the containerization of your Applications, you need a "container runtime". But which are the major virtualization programs for that out there?

Since Docker was the first to popularize containers, it seems fair to start there. Five years ago, Solomon Hykes helped found a business, Docker, which sought to make containers easy to use. With the release of Docker 1.0 in June 2014, the buzz became a roar. And, over the years, it's only got louder.

Today, Docker, and its open-source father now named Moby, is bigger than ever. According to Docker, over 3.5 million applications have been placed in containers using Docker technology and over 80 billion containerized applications have been downloaded.

If you currently don't know what containers are, please read my article on that first, it will help you to get in touch with the topic.

Okay, but what's Docker?

Docker is a computer program that performs operating-system-level virtualization, also known as “containerization”.

Docker, an open-source technology, isn't just the darling of Linux powers such as Red Hat and Canonical. Proprietary software companies such as Oracle and Microsoft have also embraced Docker. As a result, organizations report a 300 percent improvement in time to market, while reducing operational costs by 50 percent. But not only that, developer Productivity increases by 1300 percent, with a 72 percent increase in faster Issue Resolution.

In addition, Docker containers are easy to deploy in a cloud. As Ben Lloyd Pearson wrote in Opensource.com:

Docker has been designed in a way that it can be incorporated into most DevOps applications, including Puppet, Chef, Vagrant, and Ansible, or it can be used on its own to manage development environments.

Specifically, for CI/CD Docker makes it possible to set up local development environments that are exactly like a live server; run multiple development environments from the same host with unique software, operating systems, and configurations; test projects on new or different servers; and allow anyone to work on the same project with the exact same settings, regardless of the local host environment. This enables developers to run the test suites, which are vital to CI/CD, to quickly see if a newly made change works properly.

Originally, Docker used LXC but its isolation layers were incomplete, so Docker wrote libcontainer, which eventually became runc. Container popularity exploded and Docker became the de facto standard to deploy containers. When it came out in 2014, Kubernetes naturally used Docker, as Docker was the only runtime available at the time.

Wait, Kubernetes?

Kubernetes is a powerful open-source system, initially developed by Google, for managing containerized applications in a clustered environment. It is a platform designed to completely manage the life cycle of containerized applications and services using methods that provide predictability, scalability, and high availability.

To understand how Kubernetes is able to provide these capabilities, it is helpful to get a sense of how it is designed and organized at a high level. Kubernetes can be visualized as a system built in layers, with each higher layer abstracting the complexity found in the lower levels.

At its base, Kubernetes brings together individual physical or virtual machines into a cluster using a shared network to communicate between each server. This cluster is the physical platform where all Kubernetes components, capabilities, and workloads are configured.

The machines in the cluster are each given a role within the Kubernetes ecosystem. One server (or a small group in highly available deployments) functions as the master server. This server acts as a gateway and brain for the cluster by exposing an API for users and clients, health checking other servers, deciding how best to split up and assign work (known as "scheduling"), and orchestrating communication between other components. The master server acts as the primary point of contact with the cluster and is responsible for most of the centralized logic Kubernetes provides.

The other machines in the cluster are designated as nodes: servers responsible for accepting and running workloads using local and external resources. To help with isolation, management, and flexibility, Kubernetes runs applications and services in containers, so each node needs to be equipped with a container runtime (like Docker or rkt). The node receives work instructions from the master server and creates or destroys containers accordingly, adjusting networking rules to route and forward traffic appropriately.

Each Kubernetes node runs an agent process called a kubelet that’s responsible for managing the state of the node: starting, stopping, and maintaining application containers based on instructions from the control plane. A kubelet receives all of its information from the Kubernetes API server. The basic scheduling unit is called Pod, which consists of one or more containers guaranteed to be co-located on the host machine and able to share resources. Each pod is assigned a unique IP address within the cluster, allowing the application to use ports without conflict. You describe the desired state of the containers in a pod through a YAML or JSON object called a PodSpec. These objects are passed to the kubelet through the API server.

Isn't there something called Docker Swarm?

Even though Docker has fully embraced Kubernetes as the container orchestration engine of choice, the company still offers Swarm, its own fully integrated container orchestration tool. Slightly less extensible and complex than Kubernetes, it’s a good choice for Docker enthusiasts who want an easier and faster path to container deployments. In fact, Docker bundles both Swarm and Kubernetes in its enterprise edition in hopes of making them complementary tools.

The main architecture components of Swarm include:

Swarm. Like a cluster in Kubernetes, a swarm is a set of nodes with at least one master node and several worker nodes that can be virtual or physical machines.

Service. A service is the tasks a manager or agent nodes must perform on the swarm, as defined by a swarm administrator. A service defines which container images the swarm should use and which commands the swarm will run in each container. A service in this context is analogous to a microservice; for example, it’s where you’d define configuration parameters for an nginx web server running in your swarm. You also define parameters for replicas in the service definition.

Manager node. When you deploy an application into a swarm, the manager node provides several functions: it delivers work (in the form of tasks) to worker nodes, and it also manages the state of the swarm to which it belongs. The manager node can run the same services worker nodes do, but you can also configure them to only run manager node-related services.

Worker nodes. These nodes run tasks distributed by the manager node in the swarm. Each worker node runs an agent that reports back to the master node about the state of the tasks assigned to it, so the manager node can keep track of services and tasks running in the swarm.

Task. Tasks are Docker containers that execute the commands you defined in the service. Manager nodes assign tasks to worker nodes, and after this assignment, the task cannot be moved to another worker. If the task fails in a replica set, the manager will assign a new version of that task to another available node in the swarm.

Are there any other technologies to do some fancy container like stuff?

Yes! e.g. Cloud Foundry is a Open Source Cloud Application Platform backed by Company's like Cisco, IBM & SAP. It consists of several subsystems:

  • BOSH creates and deploys virtual machines (VMs) on top of a physical computing infrastructure, and deploys and runs Cloud Foundry on top of this cloud. To configure the deployment, BOSH follows a manifest document.
  • The CF Cloud Controller runs the apps and other processes on the cloud’s VMs, balancing demand and managing app lifecycles.
  • The router routes incoming traffic from the world to the VMs that are running the apps that the traffic demands, usually working with a customer-provided load balancer.
Cloud Foundry as Platform as a Service (PaaS)

Cloud Foundry designates two types of VMs: the component VMs that constitute the platform’s infrastructure, and the host VMs that host apps for the outside world. Within CF, the Diego system distributes the hosted app load over all of the host VMs, and keeps it running and balanced through demand surges, outages, or other changes. Diego accomplishes this through an auction algorithm.

To meet demand, multiple host VMs run duplicate instances of the same app. This means that apps must be portable. Cloud Foundry distributes app source code to VMs with everything the VMs need to compile and run the apps locally. This includes the OS stack that the app runs on, and a buildpack containing all languages, libraries, and services that the app uses. Before sending an app to a VM, the Cloud Controller stages it for delivery by combining stack, buildpack, and source code into a droplet that the VM can unpack, compile, and run. For simple, standalone apps with no dynamic pointers, the droplet can contain a pre-compiled executable instead of source code, language, and libraries.

Let's talk about Container Orchestration

Container orchestration is all about managing the lifecycles of containers, especially in large, dynamic environments. Software teams use container orchestration to control and automate many tasks:

  • Provisioning and deployment of containers
  • Redundancy and availability of containers
  • Scaling up or removing containers to spread application load evenly across host infrastructure
  • Movement of containers from one host to another if there is a shortage of resources in a host, or if a host dies
  • Allocation of resources between containers
  • External exposure of services running in a container with the outside world
  • Load balancing of service discovery between containers
  • Health monitoring of containers and hosts
  • Configuration of an application in relation to the containers running it

How does container orchestration work?

When you use a container orchestration tool, like Kubernetes or Docker Swarm, you typically describe the configuration of your application in a YAML or JSON file, depending on the orchestration tool. These configurations files (for example, docker-compose.yml) are where you tell the orchestration tool where to gather container images (for example, from Docker Hub), how to establish networking between containers, how to mount storage volumes, and where to store logs for that container. Typically, teams will branch and version control these configuration files so they can deploy the same applications across different development and testing environments before deploying them to production clusters.

Containers are deployed onto hosts, usually in replicated groups. When it’s time to deploy a new container into a cluster, the container orchestration tool schedules the deployment and looks for the most appropriate host to place the container based on predefined constraints (for example, CPU or memory availability). You can even place containers according to labels or metadata, or according to their proximity in relation to other hosts—all kinds of constraints can be used.

Once the container is running on the host, the orchestration tool manages its lifecycle according to the specifications you laid out in the container’s definition file (for example, its Dockerfile).

The beauty of container orchestration tools is that you can use them in any environment in which you can run containers. And containers are supported in just about any kind of environment these days, from traditional on-premise servers to public cloud instances running in Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure. Additionally, most container orchestration tools are built with Docker containers in mind.

In some of the next articles we will covering Rancher and some more cool container stuff, stay tuned!