Why should containers avoid running as root, and how do you enforce it?

By default, a container's main process runs as root (UID 0) unless explicitly configured otherwise — if an attacker achieves code execution inside such a container, they have full root privileges within that container's namespace, which meaningfully increases what they can do if they then find a way to escalate further (a container escape, or simply damage within the container's own filesystem/resources). Enforce non-root execution with the Dockerfile's `USER` instruction (or an equivalent runtime flag), following the same least-privilege principle covered throughout this stack's other security-focused questions.

What are Linux capabilities, and how do you drop unneeded ones from a container?

Linux capabilities break up the traditionally all-or-nothing power of root into dozens of distinct, individually grantable privileges (like binding to a low-numbered port, changing file ownership, or loading kernel modules) — Docker grants containers a modest default subset of these, not full root privilege, even for a container running as root. You can further reduce this with `--cap-drop`, dropping every capability the container doesn't actually need, applying the same least-privilege principle covered elsewhere in this stack directly at the kernel-privilege level.

What is the risk of mounting the Docker socket into a container?

Mounting `/var/run/docker.sock` into a container gives that container's process the ability to talk directly to the host's Docker daemon — and since the daemon can create and run new containers with essentially arbitrary privileges (including full host access via `--privileged` or host-path bind mounts), this is functionally equivalent to giving that container root access to the entire host machine. This is a common but genuinely dangerous pattern (often used to let a CI/monitoring container manage other containers) and should be treated as a significant security risk, not a routine convenience.

How do seccomp and AppArmor/SELinux profiles restrict container behavior?

**seccomp** restricts which specific Linux system calls a container's process is allowed to make at all — Docker applies a default seccomp profile blocking dozens of rarely-needed, higher-risk syscalls out of the box. **AppArmor** (or **SELinux**, depending on the host's Linux distribution) provides a complementary, broader mandatory access control layer, restricting what files, network resources, and capabilities a container's processes can access, based on a named security profile. Both work beneath and alongside namespaces/cgroups, adding another layer of kernel-enforced restriction specifically to narrow what a compromised container process could actually do.

How do you manage secrets securely with Docker?

Never bake secrets into an image via `ARG`, a `COPY`'d file, or a hardcoded `ENV` value — all of these persist in the image's layers/history and are readable by anyone who can access the image. Prefer runtime injection via `-e`/`env_file` sourced from a secrets manager (not committed to version control), Docker Swarm's or Compose's native `secrets` mechanism (which mounts secrets as files, not environment variables, reducing accidental exposure), or BuildKit's dedicated build-time secret mounting for anything a build step genuinely needs only transiently.

What is a read-only root filesystem, and how do you configure one?

A read-only root filesystem prevents a container's main filesystem from being written to at all at runtime — any attempt to write outside of explicitly mounted, writable volumes fails. This meaningfully limits what a compromised process running inside the container can do (it can't modify application binaries, install new tools, or write a persistent backdoor to the container's own filesystem), at the cost of needing to explicitly identify and mount writable volumes for any directories the application genuinely needs to write to (temp files, caches, logs, if not sent to stdout).

How do you keep base images and dependencies free of known vulnerabilities over time?

This is fundamentally an ongoing process, not a one-time fix: regularly rebuild images from an updated base image (rather than letting a build sit unrebuilt for months, silently accumulating unpatched vulnerabilities in an unchanged image), use automated dependency-update tooling (Dependabot, Renovate) to surface newer, patched versions of both base images and application-level dependencies, and integrate vulnerability scanning (see that question) into both CI and periodic re-scans of already-deployed images, since new vulnerabilities are disclosed continuously in software that hasn't itself changed at all.

What's the difference between Docker's default security posture and a hardened configuration?

Docker's defaults (a non-full-root capability set, default seccomp/AppArmor profiles) already provide meaningfully more protection than an unconfined process running directly on the host — but they still allow a container to run as root, retain a broader-than-minimal capability set, use a fully writable root filesystem, and have no resource limits, unless explicitly configured otherwise. A hardened configuration deliberately layers on top of these defaults: non-root user, `--cap-drop=ALL` plus a minimal `--cap-add`, a read-only root filesystem, explicit resource limits, and — where the threat model warrants it — stricter custom seccomp/AppArmor profiles and signed/verified images.

Security

Hardening containers — user privileges, capabilities, the Docker socket, seccomp/AppArmor, and secrets handling.

Difficulty

Open as page

The default, risky behavior

FROM node:20-slim
WORKDIR /app
COPY . .
CMD ["node", "server.js"]      # runs as ROOT by default -- no USER instruction specified

Without an explicit USER instruction, most base images default their main process to running as root (UID 0) inside the container. This isn't automatically catastrophic. The container's root is still confined by its namespace, and with default settings it doesn't have direct root access to the host. But it meaningfully raises the stakes of anything going wrong. An attacker who achieves arbitrary code execution inside a root-running container has unrestricted access to everything within that container: every file, every process, the ability to install anything. The same compromise inside a non-root container is confined to whatever that specific, limited user account can actually do.

Enforcing non-root with the USER instruction

FROM node:20-slim
WORKDIR /app
COPY --chown=node:node . .
USER node                # many official images already include a pre-created, unprivileged user
CMD ["node", "server.js"]

Many official base images (like node) already include a pre-created, unprivileged user for this purpose, conventionally also named after the runtime (like node). Using USER node switches to it. From that point in the Dockerfile onward, the container's main process, and anything it forks, runs without root privileges.

Creating your own non-root user, for images that don't provide one

FROM alpine:3.19
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser

For base images without a suitable pre-existing user, creating a dedicated, minimal, non-privileged user explicitly (with no unnecessary permissions or shell access beyond what's needed) is the standard practice.

Enforcing this at runtime, independent of what the image itself declares

docker run --user 1000:1000 myapp:1.0

The --user flag on docker run (or the equivalent securityContext.runAsUser/runAsNonRoot in Kubernetes — see that stack's SecurityContext question) can force a specific non-root UID, even for an image that would otherwise default to root. This provides an additional, deployment-time layer of enforcement that doesn't rely solely on the image's own Dockerfile having done the right thing.

Why this is worth enforcing even though container isolation exists

Namespaces and cgroups (see the fundamentals topic) provide real isolation, but they are not an absolute security boundary. Container escape vulnerabilities are not routine, but they do periodically get discovered. Running as root inside the container is precisely the condition that makes many such escapes more dangerous or more likely to succeed. Several known escape techniques specifically rely on the compromised process already having root privileges within its own namespace as a stepping stone. Running as a genuinely unprivileged, non-root user is a foundational defense-in-depth measure. It doesn't eliminate the risk of a container escape, but it substantially narrows what an attacker can do both before and during an escape attempt.

The broader principle this connects to

This is the same least-privilege principle that runs throughout this stack's other security questions (Linux capabilities, the Docker socket risk, RBAC in the Kubernetes stack). Grant only the minimum privilege actually needed, so that any single compromise's blast radius is as limited as possible. Don't assume a compromise will never happen and skip limiting its impact if it does. A quick image-review check worth making habitual: docker inspect --format='{{.Config.User}}' myapp:1.0 should show something other than empty/root.

Related Resources

Docker: Dockerfile USER instruction

Open as page

Why capabilities exist: breaking up "root" into pieces

Traditionally, Unix permission checking was binary — a process either ran as root (with unrestricted power to do essentially anything on the system) or as a regular user (subject to normal permission checks). Linux capabilities split root's traditionally monolithic power into dozens of distinct, individually grantable privileges — CAP_NET_BIND_SERVICE (bind to a port below 1024), CAP_CHOWN (change file ownership arbitrarily), CAP_SYS_ADMIN (a broad, catch-all set of administrative operations), CAP_SYS_MODULE (load kernel modules), and many others. This lets a process be granted just the specific slice of "root-like" power it actually needs, rather than all of it or none of it.

Docker's default capability set — already more restricted than full root

docker run --cap-drop=ALL --cap-add=NET_BIND_SERVICE myapp:1.0

Even a container explicitly running as root does not receive the full set of Linux capabilities that a genuinely privileged host root process would have, by Docker's own defaults. Docker grants a modest default subset — things like CAP_CHOWN, CAP_NET_BIND_SERVICE, CAP_SETUID/CAP_SETGID, and a handful of others generally needed by typical applications — while explicitly excluding more dangerous ones (CAP_SYS_ADMIN, CAP_SYS_MODULE, CAP_SYS_PTRACE, and others) by default. This is itself a form of built-in least-privilege enforcement, independent of whether the container's process is technically "root" or not.

Dropping capabilities further — reaching for true minimalism

docker run --cap-drop=ALL --cap-add=NET_BIND_SERVICE myapp:1.0

--cap-drop=ALL removes every capability, including Docker's own default set. --cap-add then selectively re-adds back only the specific ones the container genuinely needs — in this example, just NET_BIND_SERVICE (needed if the application binds to a port below 1024). Most typical application containers can run with most or all capabilities dropped entirely, especially ones that don't need to bind to a privileged port or perform any genuinely system-level operations. They were never actually using the majority of Docker's already-modest default set.

# In Kubernetes, this maps directly onto SecurityContext (see that stack's question)
securityContext:
  capabilities:
    drop: ["ALL"]
    add: ["NET_BIND_SERVICE"]

This is the exact same mechanism, and the exact same recommended pattern (drop everything, add back only what's genuinely needed), covered in the Kubernetes stack's SecurityContext question. Kubernetes's capability configuration is, at the implementation level, just configuring this identical Linux kernel feature.

Why this matters: limiting what a compromised container can actually do

If an attacker achieves code execution inside a container, the specific set of capabilities that container holds directly determines what kinds of system-level actions they can attempt next. A container with CAP_SYS_ADMIN retained gives an attacker access to a wide range of potentially escape-relevant administrative operations. The same compromise inside a container that's dropped every capability except the one or two it genuinely needs gives the attacker dramatically less to work with, even before considering any other layer of defense.

Identifying which capabilities an application actually needs

This requires either consulting the application or base image's documentation, or empirically testing it — running with --cap-drop=ALL and incrementally adding back capabilities one at a time until the application works correctly. There's no universal answer, since it depends entirely on what the specific application actually does, such as binding to privileged ports or manipulating file ownership. Once identified, that minimal set becomes a relatively low-effort, high-leverage hardening step for any production container, mirroring the same least-privilege discipline this stack applies elsewhere.

Related Resources

Docker: Runtime privilege and Linux capabilities

Open as page

What mounting the socket actually does

docker run -v /var/run/docker.sock:/var/run/docker.sock some-tool

This bind-mounts the host's Docker daemon's Unix socket directly into the container. Any process inside that container can now send requests to the host's real Docker daemon, exactly as if it were running the docker CLI directly on the host itself. (Recall from the fundamentals topic that the CLI is just a thin client talking to this exact socket.)

Why this is equivalent to host root access

# From INSIDE a container that has the Docker socket mounted:
docker run -v /:/host --privileged alpine chroot /host sh

The daemon reachable through that socket can create and start new containers with essentially arbitrary configuration. This includes mounting the host's entire root filesystem into a new container, or running with --privileged, which disables most container isolation protections entirely. Because of this, a process with access to the Docker socket can trivially use it to escape any container boundary altogether and gain full read/write access to the host's filesystem, processes, and everything else. This isn't a theoretical edge case. It's a well-known, straightforward technique. That's exactly why "Docker socket access" is treated as functionally equivalent to root on the host in serious security analysis, regardless of what privileges the container holding that socket mount otherwise appears to have.

Why this pattern exists anyway, despite the risk

# A common (risky) pattern: letting a CI runner or monitoring tool
# manage OTHER containers by talking to the host's Docker daemon
services:
  ci-runner:
    image: my-ci-runner
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock

This pattern shows up frequently in CI/CD tooling (a containerized CI runner that itself needs to build and run other containers as part of its job — "Docker-in-Docker" concerns, covered in the production topic), in monitoring/management tools that need visibility into other running containers, and in various developer convenience setups. It's a genuinely common, real-world pattern — which is exactly why understanding its risk, rather than treating it as an unremarkable convenience, matters.

Mitigations, if you must use this pattern

A read-only, filtered proxy in front of the socket. Tools like docker-socket-proxy sit between the container and the real socket, allowing only a specific, restricted subset of Docker API operations — for example, letting a monitoring tool list and inspect containers while blocking its ability to create new ones. This meaningfully reduces, though it doesn't eliminate, the risk compared to raw, unrestricted socket access.
Avoid it entirely for untrusted or lower-trust workloads — never mount the Docker socket into a container running code you don't fully trust, or that's exposed to any kind of external/user-supplied input that could lead to arbitrary command execution within it.
Prefer purpose-built alternatives where they exist. For CI/CD specifically, a genuinely separate, isolated build environment avoids the risk entirely rather than mitigating it. This could be a dedicated build VM, or Kubernetes-native build tooling like Kaniko that can build images without needing a full Docker daemon socket at all.

The broader lesson: "just mount the socket" is never a low-stakes convenience

This is a good, concrete illustration of a recurring security theme across this entire stack. A mechanism that seems like a simple convenience — giving one container the ability to manage others — can quietly grant vastly more power than intended. This is because the underlying capability, talking to the Docker daemon, doesn't have a narrower "just manage containers, not the host" mode by default. Recognizing this specific risk, and being able to explain precisely why socket access equals host root rather than just that "it's risky," is a strong signal of genuine container security understanding. It's a widely used pattern that's frequently under-appreciated for how dangerous it actually is.

Related Resources

Docker: Docker daemon attack surface

Open as page

seccomp — restricting which system calls are even allowed

Every action a program takes that involves the kernel (opening a file, creating a socket, forking a process) goes through a system call (syscall). seccomp (secure computing mode) lets you define a filter specifying exactly which syscalls a process is allowed to make. Any syscall not on the allowed list is blocked outright — typically causing the calling process to receive an error, or be killed, depending on configuration — regardless of what file permissions or capabilities might otherwise seem to allow.

docker run --security-opt seccomp=default.json myapp:1.0

Docker actually applies a default seccomp profile automatically, blocking around 44 of the roughly 300+ available Linux syscalls. This targets syscalls that are rarely needed by typical containerized applications but have historically been associated with container escapes or kernel-level exploits — things like kexec_load, various rarely-needed namespace/mount-manipulation syscalls, and others. Most applications never notice this default restriction at all, since they simply never call the blocked syscalls in normal operation.

docker run --security-opt seccomp=unconfined myapp:1.0    # disables seccomp filtering entirely -- generally a bad idea

Disabling seccomp entirely (unconfined) removes this layer of protection. This is occasionally necessary for specialized workloads that genuinely need a normally-blocked syscall, such as certain low-level debugging or tracing tools, or some specialized networking software. But this should be a deliberate, narrow exception, not a default reached for just to make an error message go away without understanding why it occurred.

AppArmor / SELinux — mandatory access control beyond syscall filtering

Where seccomp restricts which syscalls can be made at all, AppArmor (common on Ubuntu/Debian-based systems) and SELinux (common on RHEL/Fedora-based systems) restrict what a process can actually do with the syscalls it's allowed to make. This includes which specific files it can read or write, what network operations it can perform, and which capabilities it can use, based on a named security profile applied to the process.

docker run --security-opt apparmor=docker-default myapp:1.0

Docker applies a default AppArmor profile automatically on systems where AppArmor is available, similarly restricting a range of higher-risk operations by default without requiring any explicit configuration from the person running the container.

How these layers relate to namespaces, cgroups, and capabilities

Namespaces:    controls what a process can SEE (isolation)
cgroups:        controls how much a process can USE (resource limits)
Capabilities:   controls WHICH root-like privileges a process has, if any
seccomp:         controls WHICH SYSTEM CALLS a process can make at all
AppArmor/SELinux: controls WHAT a process can DO with specific files/resources/capabilities

These are genuinely complementary, layered defenses. A container could pass a resource-usage cgroup check, and be running as a properly non-root user with capabilities already dropped (see those questions). It could still benefit from an additional seccomp/AppArmor layer that specifically blocks syscalls or file access that shouldn't be reachable at all, in case some other assumption in the chain turns out to be wrong. This layering is a textbook example of defense in depth — no single mechanism is assumed to be perfectly sufficient on its own.

Why most users never think about these layers explicitly

Docker applies sensible seccomp and AppArmor defaults automatically, without requiring explicit configuration for the common case. This is exactly why many practitioners aren't aware these protections are active at all. "Just running in a container" already provides meaningfully more restriction than "just running as a regular host process," for exactly this reason. The defaults are worth leaving in place for the overwhelming majority of workloads. Disabling either layer (unconfined) should be treated as a deliberate, narrowly-scoped exception requiring real justification — never a default troubleshooting step for a confusing error.

Related Resources

Docker: Seccomp security profiles

Open as page

Why baking secrets into an image is always wrong

# NEVER do any of these
ENV DB_PASSWORD=supersecret
COPY .env /app/.env
ARG API_KEY
RUN curl -H "Authorization: Bearer $API_KEY" https://example.com

Every one of these persists the secret's value inside the image's layers or build history — recoverable by anyone with access to the image (docker history, inspecting layer contents directly, or simply docker run and reading the baked-in ENV/file). Recall from the layer-caching question that layers are effectively permanent once built — even a later layer that appears to "remove" the secret doesn't actually erase it from the earlier layer's stored data. This applies even to ARG (see that question) — build arguments can still leave traces in image metadata/history even though they're not automatically present in the running container's environment.

Runtime injection — better, but still has caveats

docker run -e DB_PASSWORD="$DB_PASSWORD" myapp:1.0

This avoids baking the secret into the image itself, but plain environment variables have their own real exposure risks. They're visible to anything that can inspect the container's configuration (docker inspect), and visible in process listings on some systems (/proc/<pid>/environ). They also commonly end up accidentally logged — many applications and frameworks log their full environment at startup for debugging purposes, inadvertently capturing secrets in log output — or exposed via a crash dump or error-reporting tool that includes environment context.

A better runtime mechanism: mounted secret files

# Docker Swarm's native secrets mechanism
echo "supersecret" | docker secret create db_password -
docker service create --secret db_password myapp:1.0

# Inside the container, the secret is available as a FILE, not an environment variable:
cat /run/secrets/db_password
# supersecret

Docker Swarm's native secrets mechanism, and Compose's own secrets: key (which can source from Swarm secrets or, for non-Swarm local development, a plain file), deliver a secret as a file mounted into the container at a well-known path, rather than as an environment variable. This avoids several of environment variables' specific exposure risks, such as accidental logging of the full environment or visibility in some process-inspection tools, since the secret only exists as file content the application must explicitly choose to read.

# Compose secrets (non-Swarm, file-based)
services:
  api:
    secrets:
      - db_password
secrets:
  db_password:
    file: ./db_password.txt    # this file itself must never be committed to version control

Build-time secrets — for values only needed transiently during the build

RUN --mount=type=secret,id=npm_token \
    NPM_TOKEN=$(cat /run/secrets/npm_token) npm install

docker build --secret id=npm_token,src=./npm_token.txt -t myapp .

BuildKit's dedicated secret-mounting syntax (--mount=type=secret) makes a secret available only during that specific RUN instruction's execution, without it persisting in any built layer or the final image's history at all. This is the correct mechanism for a private package registry token or similar credential needed transiently just to complete a build step, closing exactly the gap the ARG-for-secrets anti-pattern leaves open.

External secrets managers — the strongest option for production

For genuinely sensitive production secrets, integrate with a dedicated secrets manager — HashiCorp Vault, AWS Secrets Manager, and similar (see the SQL/Databases and Kubernetes stacks' equivalent questions). The application can fetch secrets directly at runtime, or a sidecar/init pattern can inject them. This provides stronger audit trails, rotation, and centralized access control than any Docker-native mechanism alone offers.

Secret need	Right mechanism
Build-time only (e.g. a private registry token for `npm install`)	BuildKit `--mount=type=secret`
Runtime, simple setup	Swarm/Compose native `secrets` (file-based)
Runtime, needs audit trail/rotation/centralized control	External secrets manager (Vault, AWS Secrets Manager)
Never	`ARG`, `COPY`, or `ENV` baked into the image

Related Resources

Docker: Manage sensitive data with Docker secrets

Open as page

Enabling a read-only root filesystem

docker run --read-only myapp:1.0

With this flag, the container's entire root filesystem (everything from its image layers) becomes immutable at runtime — any attempt by the application to write a file anywhere outside of an explicitly writable mount fails.

docker run myapp:1.0 touch /app/test.txt
# touch: /app/test.txt: Read-only file system

Why this meaningfully limits a compromised container's blast radius

If an attacker achieves code execution inside a container with a read-only root filesystem, they cannot: modify the application's own binaries or configuration files to plant a persistent backdoor, install additional tools via a package manager (which needs to write to the filesystem), or drop and execute an arbitrary downloaded payload anywhere in the container's normal filesystem. This is a genuinely strong, low-effort hardening measure. It doesn't prevent an attacker from doing damage entirely — they can still act within memory, make network requests, or read existing files — but it closes off an entire category of "make the compromise persistent or install further tooling" techniques.

The practical challenge: identifying what genuinely needs to be writable

docker run --read-only \
  --tmpfs /tmp \
  --tmpfs /app/cache \
  -v app-uploads:/app/uploads \
  myapp:1.0

Most real applications need some writable space — temporary files, an in-memory or on-disk cache, actual persistent data (uploaded content, database files). The pattern is to keep the root filesystem read-only overall, while explicitly providing writable space exactly where it's genuinely needed:

--tmpfs for ephemeral, memory-backed writable space (temp files, caches that don't need to survive a restart) — see the storage topic's tmpfs question.
Named volumes for anything that genuinely needs to persist (see that topic).

Application-level considerations this surfaces

Enabling a read-only root filesystem often surfaces assumptions baked into an application or its dependencies that weren't previously visible — a logging library that defaults to writing to a local file instead of stdout, a language runtime that writes temporary compiled artifacts to a directory within the application's own tree, or a package that expects to write a lock file or cache somewhere under its own installation directory. Identifying and explicitly accommodating every one of these legitimate write paths (via --tmpfs or a volume) is usually the real work involved in successfully adopting a read-only root filesystem for an existing application. This is often more work than the Docker configuration itself.

In Kubernetes

securityContext:
  readOnlyRootFilesystem: true

This maps directly onto the exact same underlying mechanism, configured as part of a Pod's SecurityContext (see that stack's question). It's specifically one of the requirements of the restricted Pod Security Standard, reflecting how significant a hardening measure this is considered industry-wide, not just a Docker-specific nicety.

Related Resources

Docker: Run containers with a read-only filesystem

Open as page

Why "build it once, ship it forever" is a real security liability

FROM node:20-slim    # built 8 months ago, never rebuilt since

Even if an application's own code hasn't changed at all in 8 months, the underlying base image's packages (OS libraries, the language runtime itself) almost certainly have known vulnerabilities disclosed against them by now that weren't known 8 months ago. An image that's never rebuilt slowly accumulates an ever-larger gap between "what's actually running" and "the latest patched versions available," even though nothing about the image's own configuration ever technically changed.

Rebuilding regularly, even without application code changes

# A scheduled CI pipeline (e.g., a nightly or weekly cron-triggered build),
# separate from the normal "build on code push" pipeline, that rebuilds
# the image from the current base image tag and re-scans it

A scheduled rebuild (independent of application code changes) picks up whatever patched packages the base image maintainers have since published under that same tag. This alone closes a meaningful fraction of vulnerabilities over time, purely by re-pulling an updated base layer, with zero application code changes required.

Automated dependency-update tooling

# A Dependabot/Renovate configuration watching for newer available versions
# of base images (in Dockerfiles) and application-level dependencies
# (package.json, requirements.txt, go.mod, etc.), opening pull requests automatically

Tools like Dependabot and Renovate can be configured to watch a project's Dockerfiles and dependency manifests, automatically opening pull requests whenever a newer version becomes available — of a base image tag, or an application-level dependency. This turns "someone has to remember to check for updates" into an automated, continuously-running process that surfaces the work as a reviewable PR rather than requiring anyone to proactively go looking for it.

Continuous scanning, not just at build time

# Periodic re-scan of an ALREADY-DEPLOYED image, on a schedule,
# independent of any new build having happened
trivy image myregistry.example.com/myapp:1.0

As covered in the vulnerability-scanning question, a scan result for an unchanged image can become worse over time purely because new CVEs are disclosed against its existing, unchanged package versions. A scanning strategy that only runs at build or CI time misses this entirely. Genuine ongoing security posture requires periodically re-scanning images that are already deployed and running, not just the ones currently being built.

Prioritizing what to actually fix

CRITICAL: 1 finding
HIGH: 5 findings
MEDIUM: 23 findings
LOW: 47 findings

Not every finding warrants equal urgency. Prioritizing by severity, by whether the vulnerable code path is actually reachable or exploitable in your specific usage, and by whether a fix is even available yet (some CVEs are disclosed before a patched version exists) is a necessary, deliberate triage process. Treating every single low-severity finding as equally urgent as a critical one tends to produce alert fatigue that ultimately causes teams to under-react to the findings that genuinely matter most.

Reducing the surface area to begin with

Smaller base images (Alpine, distroless — see that question) and multi-stage builds (see that question) both directly reduce how much software is present in the final image at all. This correspondingly reduces how much there is to ever have a vulnerability disclosed against in the first place — a genuinely proactive complement to the reactive "scan and patch" cycle described above. Taken together, none of these individual pieces (a scheduled rebuild, an update bot, a scanner) is itself "vulnerability management" — it's the ongoing combination of all of them that is.

Related Resources

Docker Scout: Policy evaluation

Open as page

This question ties together every individual hardening measure covered elsewhere in this topic into one comparative picture — useful for demonstrating you understand not just each mechanism individually, but how they compose into a genuinely hardened posture versus a merely-default one.

What Docker already does, out of the box, without any extra configuration

A restricted default capability set — not full root privilege, even for a container that appears to run as root (see the capabilities question).
A default seccomp profile blocking dozens of higher-risk, rarely-needed syscalls (see that question).
A default AppArmor (or SELinux) profile, on hosts where it's available, restricting file/resource access further (see that question).
Namespace and cgroup isolation — the fundamental isolation mechanism underlying containers in the first place (see the fundamentals topic).

These defaults are genuinely meaningful. Running an unconfined process directly on a host has none of this protection at all. So "just using Docker with no extra hardening" is already a real security improvement over the alternative in several respects.

What Docker's defaults still leave open, by default

Runs as root unless a USER instruction or --user flag says otherwise (see that question).
A writable root filesystem — nothing prevents a compromised process from modifying application files or installing tools unless --read-only is explicitly set (see that question).
No resource limits — a container can consume unbounded CPU/memory unless --memory/--cpus are explicitly configured (see the lifecycle topic), risking one container starving others sharing the host.
Secrets handled carelessly by default if the author isn't deliberate — nothing in Docker itself prevents baking a secret into an image via ENV/ARG/COPY (see that question) unless the image's author specifically avoids it.
No image signature verification — nothing prevents pulling and running an unsigned, untrusted, or tampered image by default (see the registries topic's signing question).

A concrete side-by-side

	Default	Hardened
User	Root (unless the image's own Dockerfile says otherwise)	Explicit non-root `USER`, enforced via `--user` too
Capabilities	Docker's modest default set	`--cap-drop=ALL` + minimal explicit `--cap-add`
Root filesystem	Writable	`--read-only`, with explicit `--tmpfs`/volumes for genuine write needs
Resource limits	None	Explicit `--memory`/`--cpus`
Secrets	However the image/deployment happens to handle them	Never baked into images; runtime-injected via files/secrets manager
Image provenance	Trusted by name/tag alone	Verified via signing (Cosign) before running
seccomp/AppArmor	Docker's default profiles	Default profiles, or a stricter custom profile for especially sensitive workloads

Why hardening is a deliberate, additive process, not a single switch

There's no single "make it secure" flag. A genuinely hardened container configuration is the sum of many individually modest measures, each closing off one specific category of risk, applied together. This is exactly the defense-in-depth philosophy that runs throughout this entire security topic: no single layer is assumed to be perfectly sufficient on its own, so multiple independent, complementary layers are stacked so that a weakness in any one doesn't fully compromise the whole system. Docker's defaults are a reasonable, genuinely protective starting point, not a finished production security posture.

Related Resources

Docker: Security