Docker From the Ground Up: Building Images

Docker containers are on the rise as a best practice for deploying and managing cloud-native distributed systems. Containers are instances of Docker images. It turns out that there is a lot to know and understand about images.

In this two-part tutorial, I'm covering Docker images in depth. In part one, I discussed the basic principles, design considerations, and inspecting image internals. In this part, I cover building your own images, troubleshooting, and working with image repositories.

When you come out on the other side, you'll have a solid understanding of what Docker images are exactly and how to utilize them effectively in your own applications and systems.

Building Images

There are two ways to build images. You can modify an existing container and then commit it as a new image, or you can write a Dockerfile and build it to an image. We'll go over both and explain the pros and cons.

Manual Builds

With manual builds, you treat your container like a regular computer. You install packages, you write files, and when it's all said and done, you commit it and end up with a new image that you use as a template to create many more identical containers or even base other images on.

Let's start with the alpine image, which is a very small and spartan image based on Alpine Linux. We can run it in interactive mode to get into a shell. Our goal is to add a file called "yeah" that contains the text "it works!" to the root directory and then create a new image from it called "yeah-alpine".

Here we go. Nice, we're already in the root dir. Let's see what's there.

1	> docker run -it alpine /bin/sh
2	/ # ls
3	bin dev etc home lib linuxrc media mnt proc root run sbin srv sys tmp usr var

What editor is available? No vim, no nano?

1	/ # vim
2	/bin/sh: vim: not found
3	/ # nano
4	/bin/sh: nano: not found

Oh, well. We just want to create a file:

1	/ # echo "it works!" > yeah
2	/ # cat yeah
3	it works!
4

I exited from the interactive shell, and I can see the container named "vibrant_spenc" with docker ps --all. The --all flag is important because the container is not running anymore.

1	> docker ps --all
2	CONTAINER ID IMAGE COMMAND CREATED STATUS NAMES
3	c8faeb05de5f alpine "/bin/sh" 6 minutes ago Exited vibrant_spence

Here, I create a new image from the "vibrate_spence" container. I added the commit message "mine, mine, mine" for good measure.

1	> docker commit -m "mine, mine, mine" vibrant_spence yeah-alpine
2	sha256:e3c98cd21f4d85a1428...e220da99995fd8bf6b49aa

Let's check it out. Yep, there is a new image, and in its history you can see a new layer with the "mine, mine, mine" comment.

> docker images
REPOSITORY       TAG    IMAGE ID      SIZE
yeah-alpine      latest e3c98cd21f4d  4.8 MB
python           latest 775dae9b960e  687 MB
d4w/nsenter      latest 9e4f13a0901e  83.8 kB
ubuntu-with-ssh  latest 87391dca396d  221 MB
ubuntu           latest bd3d4369aebc  127 MB
hello-world      latest c54a2cc56cbb  1.85 kB
alpine           latest 4e38e38c8ce0  4.8 MB
nsqio/nsq        latest 2a82c70fe5e3  70.7 MB

> docker history yeah-alpine
IMAGE        CREATED         SIZE   COMMENT
e3c98cd21f4d 40 seconds ago  66 B   mine, mine, mine
4e38e38c8ce0 7 months ago    4.8 MB

Now for the real test. Let's delete the container and create a new container from the image. The expected result is that the yeah file will be present in the new container.

> docker rm vibrant_spence
vibrant_spence

> docker run -it yeah-alpine /bin/sh
/ # cat yeah
it works!
/ #

What can I say? Yeah, it works!

Using a Dockerfile

Creating images out of modified containers is cool, but there is no accountability. It's hard to keep track of the changes and know what the specific modifications were. The disciplined way to create images is to build them using a Dockerfile.

The Dockerfile is a text file that is similar to a shell script, but it supports several commands. Every command that modifies the file system creates a new layer. In part one, we discussed the importance of dividing your image into layers properly. The Dockerfile is a big topic in and of itself.

Here, I'll just demonstrate a couple of commands to create another image, oh-yeah-alpine, based on a Dockerfile. In addition to creating the infamous yeah file, let's also install vim. The alpine Linux distribution uses a package management system called apk. Here is the Dockerfile:

FROM alpine

# Copy the "yeah" file from the host
COPY yeah /yeah

# Update and install vim using apk
RUN apk update && apk add vim

CMD cat /yeah

The base image is alpine. It copies the yeah file from the same host directory where the Dockerfile is (the build context path). Then, it runs apk update and installs vim. Finally, it sets the command that is executed when the container runs. In this case, it will print to the screen the content of the yeah file.

OK. Now that we know what we're getting into, let's build this thing. The -t option sets the repository. I didn't specify a tag, so it will be the default "latest".

>  docker build -t oh-yeah-alpine .
Sending build context to Docker daemon 3.072 kB
Step 1/4 : FROM alpine
 ---> 4e38e38c8ce0
Step 2/4 : COPY yeah /yeah
 ---> 1b2a228cc2a5
Removing intermediate container a6221f725845
Step 3/4 : RUN apk update && apk add vim
 ---> Running in e2c0524bd792
fetch https://dl-cdn.alpinelinux.org/.../APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org.../x86_64/APKINDEX.tar.gz
v3.4.6-60-gc61f5bf [http://dl-cdn.alpinelinux.org/alpine/v3.4/main]
v3.4.6-33-g38ef2d2 [http://dl-cdn.alpinelinux.org/.../v3.4/community]
OK: 5977 distinct packages available
(1/5) Installing lua5.2-libs (5.2.4-r2)
(2/5) Installing ncurses-terminfo-base (6.0-r7)
(3/5) Installing ncurses-terminfo (6.0-r7)
(4/5) Installing ncurses-libs (6.0-r7)
(5/5) Installing vim (7.4.1831-r2)
Executing busybox-1.24.2-r9.trigger
OK: 37 MiB in 16 packages
 ---> 7fa4cba6d14f
Removing intermediate container e2c0524bd792
Step 4/4 : CMD cat /yeah
 ---> Running in 351b4f1c1eb1
 ---> e124405f28f4
Removing intermediate container 351b4f1c1eb1
Successfully built e124405f28f4

Looks good. Let's verify the image was created:

1	> docker images \| grep oh-yeah
2
3	oh-yeah-alpine latest e124405f28f4 About a minute ago 30.5 MB

Note how installing vim and its dependencies bloated the size of the container from the 4.8MB of the base alpine image to a massive 30.5MB!

It's all very nice. But does it work?

1	> docker run oh-yeah-alpine
2	it works!

Oh yeah, it works!

In case you're still suspicious, let's go into the container and examine the yeah file with our freshly installed vim.

> docker run -it oh-yeah-alpine /bin/sh
/ # vim yeah

it works!
~
~
.
.
.
~
"yeah" 1L, 10C

The Build Context and the .dockerignore file

I didn't tell you, but originally when I tried to build the oh-yeah-alpine image, it just hung for several minutes. The issue was that I just put the Dockerfile in my home directory. When Docker builds an image, it first packs the whole directory where the Dockerfile is (including sub-directories) and makes it available for COPY commands in the Dockerfile.

Docker is not trying to be smart and analyze your COPY commands. It just packs the whole thing. Note that the build content will not end in your image, but it will slow down your build command if your build context is unnecessarily large.

In this case, I simply copied Dockerfile and yeah into a sub-directory and ran the docker build command in that sub-directory. But sometimes you have a complicated directory tree from which you want to copy specific sub-directories and files and ignore others. Enter the .dockerignore file.

This file lets you control exactly what goes into the build context. My favorite trick is to first exclude everything and then start including the bits and pieces I need. For example, in this case I could create the following .dockerignore file and keep Dockerfile and yeah in my home directory:

# Exclude EVERYTHING first
*

# Now selectively include stuff
!yeah

There is no need to include the Dockerfile itself or the .dockerignore file in the build context.

Copying vs. Mounting

Copying files into the image is sometimes what you need, but in other cases you may want your containers to be more dynamic and work with files on the host. This is where volumes and mounts come into play.

Mounting host directories is a different ball game. The data is owned by the host and not by the container. The data can be modified when the container is stopped. The same container can be started with different host directories mounted.

Tagging Images

Tagging images is very important if you develop a microservice-based system and you generate a lot of images that must be sometimes associated with each other. You can add as many tags as you want to an image.

You've already seen the default "latest" tag. Sometimes, it makes sense to add other tags, like "tested", "release-1.4", or the git commit that corresponds to the image.

You can tag an image during a build or later. Here's how to add a tag to an existing image. Note that while it's called a tag, you can also assign a new repository.

> docker tag oh-yeah-alpine oh-yeah-alpine:cool-tag
> docker tag oh-yeah-alpine oh-yeah-alpine-2

> docker images | grep oh-yeah
oh-yeah-alpine-2 latest    e124405f28f4 30.5 MB
oh-yeah-alpine   cool-tag  e124405f28f4 30.5 MB
oh-yeah-alpine   latest    e124405f28f4 30.5 MB

You can also untag by removing an image by its tag name. This is a little scary because if you remove the last tag by accident, you lose the image. But if you build images from a Dockerfile, you can just rebuild the image.

> docker rmi oh-yeah-alpine-2
Untagged: oh-yeah-alpine-2:latest

> docker rmi oh-yeah-alpine:cool-tag
Untagged: oh-yeah-alpine:cool-tag

If I try to remove the last remaining tagged image, I get an error because it is used by a container.

> docker rmi oh-yeah-alpine

Error response from daemon: conflict: unable to remove repository
reference "oh-yeah-alpine" (must force) - 
container a1443a7ca9d2 is using its referenced image e124405f28f4

But if I remove the container...

> docker rmi oh-yeah-alpine
Untagged: oh-yeah-alpine:latest
Deleted: sha256:e124405f28f48e...441d774d9413139e22386c4820df
Deleted: sha256:7fa4cba6d14fdf...d8940e6c50d30a157483de06fc59
Deleted: sha256:283d461dadfa6c...dbff864c6557af23bc5aff9d66de
Deleted: sha256:1b2a228cc2a5b4...23c80a41a41da4ff92fcac95101e
Deleted: sha256:fe5fe2290c63a0...8af394bb4bf15841661f71c71e9a

> docker images | grep oh-yeah

Yep. It's gone. But don't worry. We can rebuild it:

1	> docker build -t oh-yeah-alpine .
2
3	> docker images \| grep oh-yeah
4	oh-yeah-alpine latest 1e831ce8afe1 1 minutes ago 30.5 MB

Yay, it's back. Dockerfile for the win!

Working With Image Registries

Images are very similar in some respects to git repositories. They are also built from an ordered set of commits. You can think of two images that use the same base images as branches (although there is no merging or rebasing in Docker). An image registry is the equivalent of a central git hosting service like GitHub. Guess what's the name of the official Docker image registry? That's right, Docker Hub.

Pulling Images

When you run an image, if it doesn't exist, Docker will try to pull it from one of your configured image registries. By default it goes to Docker Hub, but you can control it in your ~/.docker/config.json file. If you use a different registry, you can follow their instructions, which typically involve logging in using their credentials.

Let's delete the "hello-world" image and pull it again using the docker pull command.

> dockere images | grep hello-world
hello-world latest c54a2cc56cbb 7 months ago 1.85 kB 

> docker rmi hello-world
hello-world

It's gone. Let's pull now.

> docker pull hello-world
Using default tag: latest
latest: Pulling from library/hello-world
78445dd45222: Pull complete
Digest: sha256:c5515758d4c5e1e...07e6f927b07d05f6d12a1ac8d7
Status: Downloaded newer image for hello-world:latest

> dockere images | grep hello-world
hello-world latest 48b5124b2768 2 weeks ago 1.84 kB

The latest hello-world was replaced with a newer version.

Pushing Images

Pushing images is a little more involved. First, you need to create an account on Docker Hub (or other registry). Next, you log in. Then you need to tag the image you want to push according to your account name ("g1g1" in my case).

> docker login -u g1g1 -p <password>
Login Succeeded

> docker tag hello-world g1g1/hello-world

> docker images | grep hello

g1g1/hello-world latest 48b5124b2768 2 weeks ago 1.84 kB
hello-world      latest 48b5124b2768 2 weeks ago 1.84 kB

Now, I can push the g1g1/hello-world tagged image.

1	> docker push g1g1/hello-world
2	The push refers to a repository [docker.io/g1g1/hello-world]
3	98c944e98de8: Mounted from library/hello-world
4	latest: digest: sha256:c5515758d4c5e...f6d12a1ac8d7 size: 524

Conclusion

Docker images are the templates to your containers. They are designed to be efficient and offer maximum reuse by using a layering file system storage driver.

Docker provides a lot of tools for listing, inspecting, building, and tagging images. You can pull and push images to image registries like Docker Hub to easily manage and share your images.