How do Docker image layers work, and how does layer caching affect build speed?
Quick Answer
Each Dockerfile instruction that changes the filesystem (RUN, COPY, ADD) produces a new, content-hashed layer; on a subsequent build, Docker checks whether an instruction's inputs (the instruction itself, plus — for COPY/ADD — the actual contents of the copied files) match a previously-built layer, and if so, reuses that cached layer instead of re-executing the instruction. This can make rebuilds dramatically faster, but the cache is invalidated from the first changed instruction onward — every layer after that point must be rebuilt, even if its own inputs are unchanged, which is exactly why instruction order matters so much.
Detailed Answer
How the cache decides whether to reuse a layer
FROM node:20-slim # Layer A
WORKDIR /app # Layer B
COPY package.json ./ # Layer C -- cache key includes package.json's actual content
RUN npm install # Layer D -- cache key includes the PRECEDING layer + this instruction's text
COPY . . # Layer E -- cache key includes the content of every copied file
For each instruction, Docker computes a cache key based on the preceding layer plus that instruction's own inputs. For RUN, that's the literal command text. For COPY/ADD, it's the actual file contents being copied, not just their names. So even a single-character change inside package.json invalidates Layer C and, since caching is sequential, everything after it too.
Why this makes rebuilds fast — when structured well
# First build: everything builds from scratch
docker build -t myapp .
# ... (30 seconds, say, mostly spent on `npm install`)
# Change only application code (not package.json), rebuild:
docker build -t myapp .
# Layer A, B, C, D all CACHE HIT (package.json unchanged, so npm install's inputs are identical)
# Only Layer E (COPY . .) and anything after it actually re-executes
# ... (2 seconds)
Because npm install (often one of the slowest steps) sits before the COPY . . that brings in frequently-changing application code, changing application code alone doesn't invalidate the expensive dependency-installation layer at all. This is the single most impactful Dockerfile optimization technique, covered in more depth in the cache-ordering question.
Why cache invalidation cascades forward, never backward
FROM node:20-slim
COPY package.json ./ # Layer C
RUN npm install # Layer D
COPY . . # Layer E <- if THIS changes, only E (and anything after) rebuilds
# C and D are unaffected, since their own inputs didn't change
If instead package.json changes, Layer C invalidates, and every layer from C onward (D, E) must rebuild too. This happens even though Layer E's own inputs (the application code) might not have changed at all. This "cascades forward from the first change" rule is why placing rarely-changing, expensive instructions (dependency installation) before frequently-changing ones (application code) is so consistently valuable. It maximizes how often the expensive early layers get to reuse the cache.
Sharing cache and layers across images, not just across builds of the same image
Layers are content-addressed and stored once on a given host. So two entirely different images that happen to share an identical layer (e.g., both FROM node:20-slim, with no differences up to some point) genuinely share that stored layer on disk — not just conceptually, but as literally the same data. This saves both disk space and pull time when a machine already has one image with a shared base layer and pulls another.
Cache-busting techniques when you deliberately want to skip the cache
docker build --no-cache -t myapp . # ignore the cache entirely for this build
This is occasionally necessary when a RUN instruction's effects depend on something outside its literal text or copied files. For example, in RUN apt-get update && apt-get install -y curl, the actual packages fetched can change over time even though the instruction's text never does. This is a common, subtle source of "why did my rebuild not pick up the latest security patches" confusion. The cache has no way to know that an identical-looking instruction might now behave differently against a changed remote package repository.