Why multi-stage Docker build does not work

Mon, Oct 28, 2019

Building inside container sounds very appealing for multiple reasons. Using multi-stage Docker suppose to help you with:

having smaller production Docker image
avoiding additional Docker file for building
avoiding intermediate images (comparing to Docker builder pattern)
no need to extract artifacts to local system (comparing to Docker builder pattern)
eliminating build differences between developers machines and continuous integration agent
common way for describing build process, independent of used continuous integration solution

Unfortunately, I believe this approach has some major flaws in common building scenarios.

Problems

Tests results analysis

Most of continuous integration systems I know, provide information about tests results. Additionally, they often make very useful analysis on tests results to give statistics and identify flaky tests. To make this features work, you need to provide files with tests results to the continuous integration system. In Java, tests results are stored as XML files. However, when running tests as part of Docker multi-stage build, it is not straightforward to take those files outside of the container. It is possible to mount host directory as volume only when running image, but with multi-state Docker build we are not running the image, just building it.

How to solve this? One solution is to build intermediate “builder” image, which contains all the output files you want to deliver to your continuous integration system with command:

docker build --target builder -t my-application-builder .

then create container without running it:

docker create --name my-application-builder my-application-builder:latest

and copying files from container to host with:

docker cp my-application-builder:/root/dev/application /build/123/output

This is not perfect, because you need to create intermediate image just to copy files, but it is pretty quick and let you keep using features your CI provides for your builds.

Running containers during integration tests execution

When running integration tests, it is common to run it against additional software like database or message broker. This allows to test more advanced features from higher perspective with components used in production later. Common approach is to use in-memory embedded database or message broker. Another way is to use Docker containers. Using containers we are able to run tests with the same components which will be running in production. There is very interesting project called Testcontainers which greatly helps with this approach for Java applications. However, if you would like to run your integration tests with Docker containers you cannot use Docker multi-stage build. It is impossible to run Docker containers inside Docker build, because Docker daemon is not available during that build. This GitHub issue describes this problem: https://github.com/testcontainers/testcontainers-java/issues/1112

Possible solution to this problem could be installing additional software (like databases and message brokers) inside builder image (or even in earlier stage).I haven’t tried this solution, but this should probably work fine. Unfortunately, with this solution we are loosing one of the most important benefits of using containers. Installing and configuring additional software is usually not trivial. Being able to use existing (often official) Docker images, significantly simplifies running them. This is all lost when we need to install them ourself in our build image.

Caching dependencies and publishing base image

One of the advantage of having Docker multi-stage build should be the possibility to cache files you are using with your build frequently. In Gradle project it is not trivial though. With Gradle I would usually just run ./gradlew clean check to build and test my code. Gradle would take care of downloading all dependencies and run everything. However, if this is all done in one step then Docker will not cache dependencies on the separate layer and cannot reuse them in the next build. To my knowledge there is no way to tell Gradle just to download dependencies for the project.

One way I found to overcome this issue is to copy only few basic Gradle files to the image first and run the build. It will download your dependency and fail, because there are no source files, so you need to tell Gradle to continue anyway using --continue argument. This is how it could look like in Dockerfile:

# Copy gradle files separately so that gradle stuff and dependencies are cached between builds
COPY build.gradle gradlew gradlew.bat $APP_HOME
COPY gradle $APP_HOME/gradle
RUN ./gradlew -i build -x :bootJar -x test -x integrationTest --continue

This is not very elegant, but I have not found any better solution for now.

Summary

Despite the list of possible profits when using multi-stage Docker builds, with all the problems described above, I’ve decided to not use it for my projects. Even though we can overcome some problems, it is too quirky and too much hustle in my opinion.

The idea behind Docker multi-stage builds is still very appealing. I believe using it may work very well for some projects. As always - there are no silver bullets and it may simply not be a good fit for Java applications. Docker 18.09 adds mode called BuildKit. It brings some experimental improvements for caching, concurrent runs, etc. I hope that in future this will also solve other problems and allow us to fully build and test our projects inside Docker multi-stage build.