This new trendy thing called docker
Hey all, i've recently been doing some development work inside of docker containers and thought i'd reflect on my experience with it. Docker, as you are probably aware, is a set of software tools for handling application containers, which you can think of as being somewhat like the older traditional BSD jails, or a chroot on steroids.
It's getting a lot of attention because it makes deploying containers quite simple, with support for automated builds and caching built in. Using a Dockerfile allows you to build and then run a container from version control without needing to store an entire root filesystem in your version control repo.
Where it really seems to shine is in eliminating the classic "works on my machine" class of bugs: if your code runs inside of a docker container on your machine, it should run the same in a docker container on other machines.
Why I used to hate docker - slow builds
I used to hate Docker and considered it an overengineered replacement for stuff we already have: BSD jails, chroots, LXC, Solaris zones and good old fashioned paravirtualized VMs (think Xen).
The problem, as I saw it, was that the isolation provided by Docker only really works when you rebuild your images and restart containers after changing your code, and this build process is slow if your Dockerfile is written badly. You can work around this by only putting your application's dependencies into a standard image and then bind mounting your application's code into the container, but doing that loses the isolation that makes Docker so useful in the first place.
In practice, I would often install application dependencies on my host OS directly and then run code from the git checkout to eliminate the slow build problem. Doing this brings back "works on my machine" bugs, but it also means you lose all the advantages of Docker.
Fixing the slow build issue - how to make everything fly
When developing new code it's important to have fast build times, and you should also be able to start and stop any container you use quickly. Doing this well means making proper use of docker's caches so that rebuilds don't take forever, and to do this properly there are a few basic ways:
- First option - use tiered base images
The idea here is you setup one base image that installs basics such as your programming language of choice (i'm a python guy myself) and stuff like nginx, then you build another base image that builds on top of this one to add app-specific dependencies, and finally you add another that sets up your actual application.
Doing this means you only rebuild the base images when you need to, Docker doesn't even need to lookup the cache for each layer in the base images, it simply imports the latest version of the base image.
- Second option - organise your Dockerfile properly so docker's cache can handle it
With this approach, you don't use base images beyond stuff like the phusion base system (a docker-optimised image of Ubuntu), then you structure your Dockerfile to install dependencies all in image.
To make it run quickly, you must structure the Dockerfile so that stuff that changes least often is earlier in the file, you also should be careful where it comes to ENV and ARG commands and make sure you aren't invalidating cache on every build.
Ultimately, each command in your Dockerfile forms a layer which Docker can cache, and each time the output of a command changes, the commands that follow it must be rerun on a rebuild.
Anything specific to your app that changes often in the code/build/run cycle should be at the bottom, while things that change less often and take longer should be towards the top. Depending on app specifics, the actual ordering may vary, but I hope to release a template demonstrating this approach soon.
- Put dependencies in Dockerfile, and mount app code from disk
This is essentially treating docker containers like VMs and is considered bad practice, but to make things really fly it can make sense - put simply, you build your docker container once and only once, and then run it by bind mounting your application code.
Inside the container you need a way to automatically reload the code from disk when it changes, this can be as simple as having a script run your application in a loop and then using "docker exec" to kill it, ensuring the next invocation loads the newest code.
While this should be VERY fast by comparison to rebuilding your image, it should be reserved for development work only, and you should run tests in a fresh container before distributing your code to others (whether inhouse at a company, or simply pushing to github for your free and open-source software).
In practice, it makes sense to combine the above approaches depending on context: write a Makefile that can build your container from scratch but also have another version of the container that can run your code from a bind mount. Write a common base which supports either form, and then have a new Dockerfile for each.
Unfortunately Docker does not support conditionals in the Dockerfile, so you can't use a build argument to optionally run COPY commands, therefore to make life easier you should use good old-fashioned make to build each image, and put the Dockerfile for your base and for each version of your container into your version control repo.
Conclusion - watch this space
When it's possible for me to do so, i'll be releasing a standard build system for web apps inside docker that uses these approaches. The goal is to make development work go quickly while still allowing fresh rebuilds for production deployment.