Image Distribution and Registries

Difficulty

What a registry actually does

A registry is a storage and distribution service specifically for container images — organized by repository (a named collection of related images, usually one application) and tag (a specific version within that repository — see the tags-vs-digests question). Pushing uploads a locally-built image to the registry. Pulling downloads an image from the registry to a local machine — this also happens automatically whenever a container is started from an image that isn't already present locally (see the fundamentals topic's hello-world walkthrough).

docker pull nginx:1.25
docker tag myapp:1.0 myusername/myapp:1.0
docker push myusername/myapp:1.0

Docker Hub — the default public registry

Unless configured otherwise, docker pull/docker push (and an unqualified FROM in a Dockerfile) default to Docker Hub. It hosts several categories of images:

  • Official images (node, postgres, nginx, python, and similar) — curated, actively maintained images for popular software, vetted by Docker and typically the recommended starting point as a base image.
  • Verified Publisher images — published directly by the software vendor themselves (e.g., a database company publishing their own official image), carrying an extra trust signal beyond a purely community-contributed image.
  • Community/user images — published by anyone with a Docker Hub account, with no particular vetting — worth treating with the same caution you'd apply to installing an unfamiliar package from a public code registry with no reputation signal.

Why unqualified image names default to Docker Hub

FROM node:20        # implicitly: docker.io/library/node:20  (Docker Hub, official image)
FROM myorg/myapp:1.0  # implicitly: docker.io/myorg/myapp:1.0  (Docker Hub, user/org namespace)

An image reference with no registry hostname prefix is resolved against Docker Hub by default. This is purely a configured default in the Docker daemon, not a hardcoded requirement. Referencing a different registry explicitly just requires including its hostname in the image reference (see the private-registry question).

Rate limiting — a real, practical Docker Hub consideration

Docker Hub imposes pull rate limits for anonymous and free-tier authenticated accounts — a detail that occasionally surfaces as mysterious ImagePullBackOff-style failures (see the Kubernetes stack's equivalent question) in CI pipelines or clusters making many pulls from many nodes in a short window. Authenticating with a Docker Hub account (even a free one) raises these limits substantially over anonymous, unauthenticated pulls, and is a common, simple mitigation for pipelines hitting this limit.

Why organizations often move to a private or alternative registry

Public registries are the right default for genuinely public, open-source images. But proprietary application images generally shouldn't be pushed to a public registry at all. This is both for confidentiality — source or build details potentially inferable from image layers — and to avoid depending on a third-party public service's availability or rate limits for critical internal deployments. This is exactly the motivation covered in the private-registry question.

Related Resources

The full sequence

docker build -t myapp:1.0 .

docker tag myapp:1.0 myregistry.example.com/myteam/myapp:1.0

docker login myregistry.example.com
# Username: ...
# Password: ...

docker push myregistry.example.com/myteam/myapp:1.0

How Docker knows which registry to talk to

The hostname prefix of the image reference is what determines the target registry — no special flag is needed on docker push itself, since the destination is fully encoded in the tag you give it:

myregistry.example.com/myteam/myapp:1.0
└────────┬────────┘ └───┬───┘ └─┬─┘ └┬┘
     registry host    namespace  repo  tag

docker.io/library/nginx:1.25    (Docker Hub, implicit -- same shape, just defaulted)

If the reference's first segment doesn't look like a hostname (no dot or colon), Docker assumes Docker Hub. If it does look like a hostname — it contains a . or :port, or is explicitly localhost — Docker resolves it as a private registry address instead.

Authenticating with docker login

docker login myregistry.example.com

Credentials are typically cached locally in ~/.docker/config.json — by default in plaintext, unless a credential helper is configured (see the security topic). This means subsequent push/pull operations against that same registry don't require re-authenticating every time within the same session or machine.

Common private registry options

  • Cloud-provider managed registries — AWS ECR, Google Artifact Registry/GCR, Azure Container Registry — tightly integrated with each cloud's own IAM system for authentication, and often the natural choice when already deployed on that cloud.
  • Self-hosted registry software — the open-source Docker Registry (registry:2 image, ironically itself distributed via Docker Hub), or more full-featured options like Harbor (adding vulnerability scanning, RBAC, and replication on top of the base registry functionality).
  • GitHub Container Registry (GHCR), GitLab Container Registry — convenient when your source code and CI/CD already live on that platform, since authentication/permissions can piggyback on the same platform identity.

Why organizations use private registries at all

  • Confidentiality — proprietary application images shouldn't be publicly pullable by anyone on the internet, the way an unauthenticated Docker Hub public repository would be.
  • Access control — a private registry can be scoped with fine-grained permissions (which teams/services can push or pull which images), mirroring the same least-privilege principles covered in the security topic and the Kubernetes stack's RBAC question.
  • Reliability and control — not depending on a third-party public service's availability or rate limits for critical internal image pulls, especially at production scale (see the Docker Hub question's rate-limiting note).
  • Compliance and scanning integration — many private registries integrate directly with vulnerability scanning (see that question) and image signing (see that question), gating what's allowed to be pushed or pulled based on organizational security policy.

Development and testing can reasonably pull common base images straight from Docker Hub. An organization's own proprietary application images belong in a private registry, scoped to only the teams and systems that genuinely need access.

Why even a specific-looking version tag isn't a true guarantee

FROM node:20.11.0-slim

This looks precisely pinned — a specific patch version, not a broad 20 or latest. But it's still just a tag. Nothing in the registry's technical model prevents the maintainers of the node image from re-pushing different content under that exact same tag later — for a critical security patch to an already-released tag, for instance, which does legitimately happen. A tag, no matter how specific it looks, is still fundamentally a mutable pointer, not a permanent guarantee of identical content.

Digest pinning — the actual guarantee

FROM node@sha256:a1b2c3d4e5f6789...

This reference can never silently change. The digest is a cryptographic hash of the image's actual content, so any change to that content produces a completely different hash. This exact reference either resolves to the exact same bytes every time, or fails outright if that specific content is no longer available at all, rather than silently substituting something different.

Where this matters most: supply-chain security and reproducible builds

# CI/CD pipeline, or a Kubernetes manifest, deploying a base or dependency image
image: node@sha256:a1b2c3d4e5f6789...

Digest pinning is the only mechanism that provides a real guarantee, for anything where you need absolute certainty about exactly what's being built or deployed. This includes a security audit needing to confirm precisely what code is running, a compliance requirement for reproducible builds, or simply wanting to eliminate an entire class of "it worked yesterday, broke today with no code changes on our end" incidents caused by an upstream base image silently changing under an unchanged tag.

The common middle-ground practice: automated digest resolution

# A CI pipeline step that resolves a tag to its current digest at build time,
# then uses that resolved digest for the actual deployment reference --
# giving humans the readability of a tag during development, while the
# ACTUAL deployed/built reference is digest-pinned underneath
docker pull node:20.11.0-slim
docker inspect node:20.11.0-slim --format='{{index .RepoDigests 0}}'

Many teams don't hand-write digest references directly in source-controlled Dockerfiles, since that would be unreadable and hard to update deliberately. Instead, CI/CD automation resolves and locks in the actual digest at build time, often recording it in a lockfile-like artifact. This gives both human readability during everyday development and true reproducibility for what's actually built and shipped.

The tradeoff: you lose automatic security patches

FROM node:20-slim               # gets security patches automatically on rebuild, but is a moving target
FROM node@sha256:abc123...       # never changes, but you must manually update this reference
                                  # to actually receive newer base-image security patches

Digest pinning involves a real tradeoff: you gain perfect reproducibility, but you also lose the often-desirable behavior of automatically picking up upstream security patches published under a broader, still-actively-maintained tag. This is why many teams pin digests specifically for final, deployed production artifacts, while still tracking broader version tags for base images used during ongoing development — with active dependency-update tooling, like Dependabot or Renovate, opening pull requests when new patch versions or digests are available.

Related Resources

Why architecture matters for container images

A container image (unlike, say, an interpreted script) typically contains compiled, architecture-specific binaries. A binary compiled for amd64 (traditional Intel/AMD 64-bit) won't run on an arm64 machine (Apple Silicon Macs, AWS Graviton instances, many Raspberry Pi-class devices), and vice versa. Without multi-architecture support, an organization deploying to both x86-64 servers and ARM-based infrastructure would need to build, tag, and manage entirely separate images for each architecture. It would also need to manually track which one to deploy where.

Building a multi-architecture image with buildx

docker buildx create --use --name multiarch-builder

docker buildx build \
  --platform linux/amd64,linux/arm64 \
  -t myregistry.example.com/myapp:1.0 \
  --push \
  .

This single command builds the image for both specified architectures and pushes a single manifest list (sometimes called a "fat manifest") to the registry. This is one image reference (myapp:1.0) that actually points at multiple architecture-specific image variants underneath.

What happens when someone pulls this image

docker pull myregistry.example.com/myapp:1.0

Docker automatically inspects the manifest list, detects the pulling machine's own architecture, and fetches only the matching variant. An arm64 Mac and an amd64 cloud server both run docker pull myapp:1.0 identically, and each transparently receives the correct binary for its own architecture. No explicit architecture-selection is needed by the person or system doing the pulling.

How buildx actually builds for an architecture different from the build machine's own

Building an arm64 image on an amd64 build machine (or vice versa) requires either cross-compilation, if the underlying build tooling supports it directly, or QEMU-based emulation. buildx can set up QEMU emulation automatically, running the build steps for the "foreign" architecture through an emulated environment. This is meaningfully slower than a native build, since emulation has real overhead. This is why performance-sensitive multi-arch CI pipelines sometimes instead use separate native build machines per architecture — a real arm64 runner and a real amd64 runner, each building its own native variant. The results are then combined into one manifest list afterward, rather than relying purely on QEMU emulation for the non-native architecture.

Why this has become increasingly important

Apple Silicon Macs (ARM-based) have become common among developers. ARM-based cloud instances, like AWS Graviton, are often notably cheaper and more power-efficient than equivalent x86-64 instances, and have become common in production too. Together, these mean an organization can no longer safely assume "everyone builds and runs on amd64." Multi-architecture image support has gone from a niche concern to a common, practical requirement for many teams.

buildx's broader role beyond multi-arch

buildx is Docker's interface to BuildKit (see the performance topic's question), Docker's modern build engine. Multi-architecture building is one of its most visible capabilities. BuildKit/buildx also provides improved build caching, build secrets (see the ARG/ENV question's note on secure secret handling during builds), and other capabilities beyond what the older, legacy build engine supported. For a single-architecture deployment target the added complexity isn't strictly necessary, but it's common enough now that many teams build multi-arch by default regardless, simply to avoid revisiting the question later.

How scanning actually works

docker scout cves myapp:1.0
# or
trivy image myapp:1.0

A scanner inspects an image's layers to build an inventory of every installed package and its exact version. This includes OS-level packages via the package manager's metadata, plus language-level dependencies — npm packages, Python packages, Go modules, and others, depending on the scanner's capability. It then cross-references this inventory against vulnerability databases, such as the National Vulnerability Database and vendor-specific advisories, to report which specific packages have known, publicly disclosed vulnerabilities (CVEs, or Common Vulnerabilities and Exposures), typically with a severity rating (critical, high, medium, or low) for each.

myapp:1.0
Total: 12 vulnerabilities found

CRITICAL: 1
  - CVE-2023-XXXXX in openssl 1.1.1k (fixed in 1.1.1t)
HIGH: 3
  - CVE-2022-YYYYY in libcurl 7.68.0 (fixed in 7.74.0)
  ...

Why images need to be scanned repeatedly, not just once

A vulnerability report for a given image can genuinely change without the image itself ever changing at all. A new CVE disclosed today might affect a package version that's been sitting unchanged inside an already-built, already-deployed image for months. This is why scanning shouldn't be treated as a one-time gate at build time alone. Periodically re-scanning already-deployed images against the continuously updated vulnerability database is essential to catch newly disclosed issues in software you already shipped and forgot about.

Integrating scanning into CI/CD

# Simplified CI pipeline step concept
- name: Scan image for vulnerabilities
  run: trivy image --exit-code 1 --severity CRITICAL,HIGH myapp:${{ github.sha }}

A common practice is failing the CI build (--exit-code 1) if the scan finds vulnerabilities at or above a chosen severity threshold. This prevents a genuinely dangerous image from ever reaching a registry or a production deployment in the first place, shifting the security check as early in the pipeline as practical ("shifting left").

What to actually do about a finding

Most findings resolve one of a few ways. The first is to update the base image — often the single highest-leverage fix, since a newer base image tag frequently already includes patched versions of many underlying packages (see the base-image question). The second is to update the specific affected dependency directly, if it's pinned to an outdated version in your own application's dependency manifest. For a small number of unavoidable findings with no available fix yet, the option is to document and accept the risk deliberately, with a tracked exception rather than silently ignoring it, if the vulnerable code path genuinely isn't reachable or exploitable in your specific usage.

Reducing the attack surface in the first place

Scanning is a detection mechanism, not a prevention one. Pairing it with the practices covered elsewhere in this stack meaningfully reduces how much there is to find in the first place. Smaller base images (Alpine, distroless — see that question) simply contain fewer packages overall, and multi-stage builds (see that question) exclude build-time-only tooling from the final image entirely. Both directly shrink the surface a scanner has to report on.

Registry-integrated scanning

Many registries (Docker Hub's paid tiers, GitHub Container Registry, AWS ECR, Harbor) offer built-in scanning that runs automatically on every push, surfacing results directly in the registry's UI/API. This is convenient for centralizing scan results without needing a separate standalone CI step. Standalone tools like Trivy remain popular specifically because they can run identically in any CI system or locally on a developer's machine, independent of which registry is ultimately used.

Related Resources