Dockerizing Anything and Everything

You have probably heard about containers and virtualization in your CS classes back in college, and now you hear the ‘cool’ devs at your job talk about Docker (or Vagrant) at the coffee machine. Docker is hotter than hot because it makes it possible to get far more apps running on the same old servers and it also makes it very easy to package and ship programs.

First things first.

What is Docker and why you need it:

To put it simply, Docker is a type of virtualization which is faster and lighter than other VMs thanks to a different attitude to the infrastructure. If we compare it to a standard VM that can take even a few minutes to get started, Docker can be launched in just a few seconds. What’s more, you can easily run dozens of Docker containers on a single PC which could otherwise launch no more than 5 VMs. No matter if it’s Dev or Staging or Production environments, and what OS it is, Docker works the same everywhere. This means you won’t have to look for bugs that, say, appear on the LIVE server but cannot be reproduced locally.

Why this article:

Most guides are focused on specific technology(Java or Python) and using a specific OS (usually Ubuntu). From my learning experiences with Docker, I can assure you that you can create your own images from scratch and ship them to customers. Docker is meant for putting your entire environment within a container and sleep peacefully at night, knowing that your codebase will work as expected on the client’s end.

This post is not a “Getting started with Docker” guide but targeted more towards devs/DevOps engineers who are somewhat familiar with containerization.

Let’s roll.

Everything we want in our containers (think VMs) should be clearly mentioned in your Dockerfile. This file is responsible for generating your images, which will then be used to spin up containers.


Step 1: Choose a base image

This is your base OS, or rather a stripped down version of it. Docker(the company) maintains a repository called Docker Hub from where you can easily download ready-made docker images shared by other developers or the organizations themselves.

To get a bare-bones OS and start building your image from there, you probably want something like this. Most open source projects use the latest Ubuntu LTS as their base OS image.

FROM ubuntu:16.04

There are also specific base images for specific tech stack:

For Java: https://hub.docker.com/_/openjdk/

For Python: https://hub.docker.com/_/python/

To use these base images, you can simply write FROM openjdk or FROM python depending on your requirements. Essentially, this should be the first line of your Dockerfile if you want a pre-built image specifically designed to run Java apps or similarly, Python apps.

The above-linked images use Ubuntu as their base OS image. But Ubuntu is large and even the stripped down base image can be well over 200 mb. A best practice from Docker 101 is: Make your images as light as possible. Developers are racing to create the thinnest, most usable image possible. And you should too. So consider using Alpine images as your base:

https://hub.docker.com/_/alpine/

They provide a much smaller base image (as small as 5 MB).

Note: apt-get commands will not work on those images. Alpine uses its own package repository and tool.

In my case, I had to run my apps on CentOS. But docker doesn’t provide you with a JDK image built on top of CentOS. So, I pulled the official CentOS image from docker hub and …

Step 2: Installing the necessary packages and updates

Once you have your base OS image, now it’s time to run the standard updates that you do whenever you install a new VM.

Considering we use Ubuntu as our base OS image, run apt-get update and apt-get install <package-name> on the same line. This is not only a common practice, you need to do it, otherwise the “apt-get update” temporary image (layer) can be cached and may not update the package information you need immediately after.

RUN apt-get update && apt-get install -y <package-names, seperated by whitespace>

Refactor your package manager accordingly to yum (for CentOS/Red Hat) or apk (for Alpine Linux).

Double check if you are installing ONLY what you really need. I have seen people installing vim or nano and other development tools inside their images.

Step 3: Adding your custom files

To make your custom files which your application will require (eg. configs, scripts, tarballs etc) available in your container, you have to use ADD or COPY.

Although ADD and COPY are functionally similar, generally speaking, COPY is preferred. That’s because it’s more transparent than ADD. COPY only supports the basic copying of local files into the container, while ADD has some features (like local-only tar extraction and remote URL support) that are not immediately obvious. Consequently, the best use for ADD is local tar file auto-extraction into the image, as in ADD rootfs.tar.xz /, which will extract the rootfs tar file and place its contents in the root of your container.

Step 4: Expose ports

This is pretty self-explanatory. You define and expose the ports where your application is running. Instead of using privileged ports like 80, use a non-privileged port (e.g. 8080) and map it during the container execution.

EXPOSE 8080

Try not to create the need for your container to run as root just because you want it to expose a privileged low port, which brings us to the next step.

Step 5: Defining Users for your container

Avoid running your container as root as much as possible.

Specify a separate user with specific access and define them as follows:

USER dev

Step 6: Persisting data

Container data is supposed to be temporary, being available only while the container is alive and running. To ensure that data is persisted across containers and container reboots (and failures), you need to save the data either on a mounted volume or on a bind mounts. This is especially important if you are trying to containerize your databases.

I used GlusterFS (similar to NFS) when dockerizing my Cassandra instance.

You can also bootstrap your data with your container. To do this, you want to save your data on a folder on the Base OS used in your Dockerfile. This will ensure that whenever a new container is built from your image, the data is already present and ready for use.

Step 7: Defining entrypoint

Create a dockerentrypoint.sh script where you can hook things like configuration using environment variables. This is a standard practice and you can check how devs are leveraging its use. Just go through the entry points in these open source contributions like Elastic Search and MongoDB.

https://github.com/elastic/elasticsearch-docker/tree/master/build/elasticsearch/bin

https://github.com/docker-library/mongo/tree/master/4.0

Step 8: Building your images

Once you have your Dockerfile and other files (scripts and custom source code) ready, run a docker build command to build your image.

docker build -t <my-custom-image>:<image-tag>

Next, you can do a docker images to view all images you have on your end.

Step 9: Spinning up a container from your image

Now, for the final act of the night.

docker run <image-name>

and that’s it. You should now be inside your container running on your custom image.

ThirdEye Data

Transforming Enterprises with
Data & AI Services & Solutions.

ThirdEye delivers Data and AI services & solutions for enterprises worldwide by
leveraging state-of-the-art Data & AI technologies.