Back to blog
TELEPRESENCE

How to Not Be the Engineer Running 3.5GB Docker Images

David Mckay
May 9, 2016 | 7 min read
Docker Images

Let’s cut to the chase: you’re adopting a microservice architecture, and you’re planning to use Docker. There’s a reason it is so en vogue – it solves lots and lots of problems and has zero negative effect on our projects, right As with every tool, technology, or paradigm thrust upon us as we scrappily try to maintain our sanity while jumping from shiny to shiny, we need to learn the gotchas.

To do this, I like to start with a simple question: How might this new shiny bite me on the ass, and what can I do to avoid having teeth marks on my rear? I want to tackle a problem I have seen repeatedly during my consultations with teams/organizations adopting Docker.

Behemoth Docker Images

$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
awesome-micro-service latest 61562a134d38 About a minute ago 3.5 GB

Woah! Look at the size of that image. Awesome microservice is 3.5GB! So much for micro.

What on earth is a Docker image anyways?

To understand why our images are big, we need to understand what images are in the first place.

A Docker image is the output of a

docker build
. The build process runs each of the instructions within a
Dockerfile
. Each instruction executed creates a layer. Layers encapsulate the file system changes that the instruction has caused. A Docker image is a collection of layers. Let’s look closer so we can describe a Docker image in more detail.

Example:

Assume we’re going to bring Docker into our PHP workflow. In order to run our PHP application, we need a Debian-based system with PHP installed. We’ll need to describe the environment required to run our application within a Docker container.

# Dockerfile
FROM debian:jessie
RUN echo "Building ..."
RUN DEBIAN_FRONTEND=noninteractive apt-get update
RUN DEBIAN_FRONTEND=noninteractive apt-get install php5-cli

Super simple. Super declarative. Though completely useless until we build it. The build process takes a

Dockerfile
and
context
and produces a
Docker image
.

The

context
is the directory that will be sent to the Dockerfile to satisfy any file requirements, such as
ADD
or
COPY
commands, etc.

# docker build -t -f
# If the Dockerfile is within the root of our context, we can omit the -f
$ docker build -t my-debian-php:latest -f Dockerfile .
...
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
my-docker-php latest 61562a134d38 About a minute ago 163.5 MB

So what’s actually going on? What’s inside my Docker image?

It’s a file system. When you run an

apt-get install vim
, all you’re telling the computer to do is put some files on your hard drive. The Docker image encapsulates that and keeps track of all new / modified / deleted files.

These file system changes are tracked in layers. Each layer is the the encapsulation of the file system changes for each instruction in your Dockerfile.

Docker provides a command to visualize our Docker images. As you’ll see in the output below:

  1. We have no control over the size of our base image, other than changing base image.
    This is the “” layer at the bottom of the list.
  2. Some keywords cost us nothing. Examples include CMD, USER, WORKDIR, etc.
$ docker history my-docker-php
IMAGE CREATED CREATED BY SIZE COMMENT
b4e7e4004eeb 4 seconds ago /bin/sh -c #(nop) CMD ["vim"] 0 B
d2a8ad35f9f4 4 seconds ago /bin/sh -c echo 0 B
6fc559885751 36 minutes ago /bin/sh -c DEBIAN_FRONTEND=noninteractive apt 38.37 MB
f50f9524513f 8 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B
8 weeks ago /bin/sh -c #(nop) ADD file:b5391cb13172fb513d 125.1 MB

Note: If your command makes no changes to the file-system (Like our RUN echo “Building …”), a layer is still created. It just has a zero-byte size.

So in-order to keep our images micro, we need to keep the output of our layers to a minimum

Gotcha’s

1. File Ownership & Permissions

Never, and I mean it, never change the ownership or permissions of a file inside a Dockerfile unless you absolutely NEED to. When you need to, try to modify as few files as possible.

Although comparisons can be made, Docker isn’t like Git. It doesn’t know what changes have happened inside your layer, only which files are affected. This will cause Docker to create a new layer, replicating/replacing the files. This can cause your image to double in size if you’re modifying particularly large files, or worse, every file!

Example:

# Dockerfile
FROM debian:jessie
ADD large_file /var/wwwlarge_file
RUN chown www-data /var/www/large_file
RUN chmod 756 /var/www/large_file
$ docker build -t gotcha-1 .
...
$ docker images gotcha-1
REPOSITORY TAG IMAGE ID CREATED SIZE
gotcha-1 latest 49b4a4ea228a About a minute ago 3.346 GB
$ docker history gotcha-1
IMAGE CREATED CREATED BY SIZE COMMENT
49b4a4ea228a 36 seconds ago /bin/sh -c chmod 756 /var/www/large_file 1.074 GB
09d77316932b 2 minutes ago /bin/sh -c chown www-data /var/www/large_file 1.074 GB
7adb7c72c3ef 2 minutes ago /bin/sh -c #(nop) ADD file:a86f6dedfb4ba54972 1.074 GB
f50f9524513f 8 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B
8 weeks ago /bin/sh -c #(nop) ADD file:b5391cb13172fb513d 125.1 MB

Tip: If you’re having problems with permissions inside your container, modify them using your entrypoint script, or modify the user id to reflect what you need. Do not modify the files.

Example

Changing the user-id of

www-data
to match yours. Tweak as necessary:

RUN usermod -u 1000 www-data

Or run your container with an entrypoint script:

$ cat my-script
#!/bin/bash
chown www-data -R /var/www/
apache2
$ docker run my-debian-php --entrypoint=/bin/my-script

2. Clean up after untidy commands

Sometimes other commands leave a trail of garbage at their sides and couldn’t care about the size of your images. We accept this on our desktops and preach “cache” and “performance”. Inside our images, it’s just pure filth.

Example:

# Dockerfile
FROM debian:jessie
RUN DEBIAN_FRONTEND=noninteractive apt-get update
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y vim
$ docker build -t debian .
...
$ docker history debian
IMAGE CREATED CREATED BY SIZE COMMENT
ae5a25410c0d 10 seconds ago /bin/sh -c DEBIAN_FRONTEND=noninteractive apt 28.68 MB
aaf5660234d3 21 minutes ago /bin/sh -c DEBIAN_FRONTEND=noninteractive apt 9.694 MB
f50f9524513f 8 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B
8 weeks ago /bin/sh -c #(nop) ADD file:b5391cb13172fb513d 125.1 MB

As you can see from the output above, our

apt-get update
costs us about 10MB and out
apt-get install
costs us about 30MB. Obviously these are trivial examples, but in larger builds this space will accumulate!

First, let's examine and see what each command is doing to our image. To do this, create an interactive Docker image and bash in:

$ docker run -ti --rm --name live debian:jessie bash

You’ll be live inside the innards of a Debian container and at a bash prompt. Next, let’s get a second terminal window open and inspect the container:

$ docker diff live $

No output. That’s good, because we’ve not done anything yet.

docker diff
 allows us to see what’s changed inside our container. So lets run our first command:

Note: “$ ” is my local prompt and “root@4552beab7001:/#” is inside the container.

root@4552beab7001:/# apt-get update

$ docker diff live
C /var
C /var/lib
C /var/lib/apt
C /var/lib/apt/lists
A /var/lib/apt/lists/httpredir.debian.org_debian_dists_jessie_main_binary-amd64_Packages.gz
A /var/lib/apt/lists/security.debian.org_dists_jessie_updates_main_binary-amd64_Packages.gz
A /var/lib/apt/lists/httpredir.debian.org_debian_dists_jessie-updates_main_binary-amd64_Packages.gz
A /var/lib/apt/lists/httpredir.debian.org_debian_dists_jessie_Release
A /var/lib/apt/lists/httpredir.debian.org_debian_dists_jessie_Release.gpg
A /var/lib/apt/lists/httpredir.debian.org_debian_dists_jessie-updates_InRelease
A /var/lib/apt/lists/lock
A /var/lib/apt/lists/security.debian.org_dists_jessie_updates_InRelease

Oooh, we’ve just discovered where our 10MB is going. Lets fix it by tweaking our Dockerfile to delete our

apt
cache after installing vim. Your initial thought may be to tweak as:

# Dockerfile
FROM debian:jessie
RUN DEBIAN_FRONTEND=noninteractive apt-get update
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y vim
RUN rm -rf /var/lib/apt

Unfortunately, this will only add another layer and not affect the previous layers. So although we’re deleting files, the previous layer still knows them. The common trick is to chain our commands at the shell level. This way, the files don’t exist when the RUN is finished, and they never exist in our history.

# Dockerfile
FROM debian:jessie
RUN DEBIAN_FRONTEND=noninteractive apt-get update \
&& apt-get install -y vim \
&& rm -rf /var/lib/apt
$ docker history debian
IMAGE CREATED CREATED BY SIZE COMMENT
be6afc32bd37 5 seconds ago /bin/sh -c DEBIAN_FRONTEND=noninteractive apt 28.68 MB
f50f9524513f 8 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B
8 weeks ago /bin/sh -c #(nop) ADD file:b5391cb13172fb513d 125.1 MB

Much better 🙂 You can repeat that process for every

RUN
inside your Dockerfile and really cut
the fat out of your image.

Tips

Tip #1.

Create and maintain your own base images, preferably on Alpine! Alpine Linux (http://alpinelinux.org/) is tiny (Under 5MB!) and has a really strong package manager. If you can, use it and keep your base images lean.

Why is creating / maintaining your own base image ideal? Most “official” images are quite bloated and try to be as general as possible. You know what you need. It’s like compiling your own kernel, only not as dangerous 😀

Tip #2.

ONBUILD
. Use it. When crafting base images,
ONBUILD
gives you a great way to reuse this image for both development and production.
ONBUILD
tells Docker that when the image is used as a base, we should perform some extra instructions, such as the following, which puts our code into the container for a production build.

ONBUILD ADD . /var/www

As this only runs when being used as a base, our

docker-compose.yml
, used for development, can instead mount a volume into the container, for getting our code changes into the container without a rebuild 🙂

services:
application:
image: my-base
volumes:
- .:/var/www

Tip #3.

Be careful using community images. They disappear. Often. Fork and maintain your own if it’s mission critical. You’re also putting your trust in the maintainer to protect your attach surface, but that’s a security issue and another post for next time.

Telepresence Integration

Ready to Test Telepresence with Docker