How do you order Dockerfile instructions to maximize cache reuse?
Quick Answer
Place instructions that change rarely (installing system packages, installing dependencies from a lockfile) before instructions that change frequently (copying in your actual application source code), so that a typical day-to-day code change only invalidates the cheap, fast final layers — not the expensive dependency-installation step. The general principle: order by "least likely to change" first, "most likely to change" last.
Detailed Answer
The anti-pattern: copying everything before installing dependencies
# BAD ORDERING
FROM node:20-slim
WORKDIR /app
COPY . . # copies EVERYTHING, including source code that changes constantly
RUN npm install # this layer's cache key now depends on the ENTIRE copied tree
CMD ["node", "server.js"]
With this ordering, changing any file in the project invalidates the COPY . . layer. This is true even for a single comment in an unrelated source file that has nothing to do with dependencies. Invalidating the COPY . . layer in turn invalidates the npm install layer right after it, since its cache key depends on the preceding layer. Every single build then re-runs the full dependency installation from scratch, even though the actual dependency list (package.json) hasn't changed at all. This is a slow, entirely avoidable rebuild on every code change.
The fix: copy the dependency manifest first, install, then copy the rest
# GOOD ORDERING
FROM node:20-slim
WORKDIR /app
COPY package.json package-lock.json ./ # only the dependency manifest -- changes rarely
RUN npm ci # cached, as long as the manifest hasn't changed
COPY . . # application code -- changes constantly, but
# this is now the LAST filesystem-changing step
CMD ["node", "server.js"]
Now, changing application code only invalidates the final COPY . . layer. The npm ci layer, which is often much slower, stays cached as long as package.json/package-lock.json haven't changed. This is the common case for most day-to-day commits.
The general principle, stated once
Order instructions from least-likely-to-change to most-likely-to-change. System package installation and dependency installation (driven by a lockfile that changes relatively rarely) belong early; application source code (which changes on nearly every commit) belongs as late as possible.
FROM python:3.12-slim
RUN apt-get update && apt-get install -y libpq-dev # rarely changes
COPY requirements.txt . # changes occasionally
RUN pip install -r requirements.txt # cached unless requirements.txt changes
COPY . . # changes on every commit -- last
CMD ["python", "app.py"]
Combining related RUN instructions to control layer granularity
# Creates two separate layers, and (more importantly) leaves package-manager
# cache/lists behind in the FIRST layer even after the second layer "removes" them,
# since removal in a later layer doesn't shrink an earlier, already-committed layer
RUN apt-get update
RUN apt-get install -y curl && rm -rf /var/lib/apt/lists/*
# Better: combine into ONE layer so cleanup actually reduces that layer's size
RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*
Since each layer is immutable once committed, "deleting" a file in a later layer doesn't reclaim the space that file used in an earlier layer. It just hides that file from the merged view (recall the union filesystem question). Combining install-then-cleanup into a single RUN instruction ensures the cleanup actually shrinks that one resulting layer, rather than leaving bloat in an earlier layer that a later layer merely masks.
A quick self-check for any existing Dockerfile: change one line of application code, rebuild, and ask what the minimum set of layers should have needed to re-execute. If the actual rebuild touches an expensive dependency-installation step that has nothing to do with that change, the ordering has room to improve. This is often the difference between a multi-minute rebuild and one that takes a couple of seconds.