What is a multi-stage build, and what problem does it solve?

Detailed Answer

The problem: build tools bloat the final image

# Single-stage build -- the final image includes EVERYTHING used to build it
FROM golang:1.22
WORKDIR /app
COPY . .
RUN go build -o server .
CMD ["./server"]

This works, but the resulting image includes the entire Go toolchain: the compiler, standard library source, and build caches. That adds up to hundreds of megabytes, even though the running application, once compiled, is just a single, small, statically-linked binary. None of that build tooling is needed at runtime. It is not even wanted, since it also increases the attack surface.

The multi-stage solution

# Stage 1: "builder" -- has the full toolchain, produces the compiled binary
FROM golang:1.22 AS builder
WORKDIR /app
COPY . .
RUN go build -o server .

# Stage 2: the FINAL image -- minimal, only what's needed to RUN the binary
FROM alpine:3.19
COPY --from=builder /app/server /usr/local/bin/server
CMD ["server"]

The COPY --from=builder instruction reaches back into the first stage's filesystem and copies out just the compiled server binary. None of the Go compiler, source code, or build-time dependencies from the builder stage make it into the final image at all. The final image can be a tiny base — even scratch, an entirely empty base image, for a fully static binary with no runtime dependencies. This often shrinks the final image from hundreds of megabytes down to tens of megabytes or less.

Multiple intermediate stages

FROM node:20 AS deps
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci

FROM node:20 AS build
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
RUN npm run build

FROM nginx:alpine AS final
COPY --from=build /app/dist /usr/share/nginx/html

Stages can be named (AS deps, AS build) and referenced by name in later COPY --from= instructions. This is useful for separating concerns — installing dependencies vs. building vs. the final runtime image — even when the language or runtime doesn't produce a single standalone compiled binary the way Go does. Note that this example's final stage even uses a completely different base image (nginx:alpine) than the build stages (node:20). The final stage just needs to serve the already-built static files, with no Node.js runtime required at all.

Why this matters beyond just image size

Reduced attack surface — a smaller final image with no compilers, build tools, or source code present means fewer things for a compromised container to exploit or exfiltrate (see the security topic).
Faster pulls and deployments — a smaller image transfers faster across the network to every node that needs to run it, meaningfully speeding up deployments and autoscaling events at real scale.
A single Dockerfile, still — before multi-stage builds existed, achieving this same "build in one environment, run in a minimal one" pattern required a different approach. One option was two separate Dockerfiles, with manual artifact copying between them via a shared volume or a script. Another option was building outside Docker entirely and then COPYing a pre-built artifact in. Both approaches are more awkward and error-prone than expressing the whole pipeline declaratively in one file.

What is a multi-stage build, and what problem does it solve?

Quick Answer

Detailed Answer

The problem: build tools bloat the final image

The multi-stage solution

Multiple intermediate stages

Why this matters beyond just image size

Related Resources

Related Questions