The problem before containers
Deploying an application traditionally meant hoping the target machine had the right language runtime version, the right system libraries, and no conflicting versions of anything else already installed. The phrase "it works on my machine" became common for exactly this reason: a developer's laptop, a QA server, and production rarely had identical environments. Subtle mismatches, such as a different OpenSSL version or a missing system package, caused failures that were maddening to reproduce and debug.
What Docker packages together
A Docker image bundles:
- The application's own code
- The specific language runtime/interpreter version it needs
- Every library and system dependency it depends on
- Configuration and environment setup
FROM node:20-slim
WORKDIR /app
COPY package.json .
RUN npm install
COPY . .
CMD ["node", "server.js"]
Building this once produces a single artifact that contains everything needed to run the application — no more "make sure Node 20 and these specific npm packages are installed on the server first."
Images vs. containers, briefly (covered fully in the next question)
The image is the packaged, immutable artifact. A container is a running instance of that image, isolated from other processes on the host via Linux kernel features (namespaces and cgroups — see that question). It shares the host's kernel rather than virtualizing an entire separate operating system.
Why this matters practically
docker run myapp:1.0
Running this exact command, with this exact image, produces the exact same running environment whether it's executed on a developer's laptop, a CI server, or production. The image is the single source of truth for "what the application needs to run," eliminating an entire class of environment-mismatch bugs. This also makes deployment portable across infrastructure: the same image can run on a bare-metal server, a cloud VM, or be scheduled by an orchestrator like Kubernetes (see that stack), without rebuilding anything for each target.
Beyond consistency: additional benefits
- Isolation — a container's process, filesystem, and network namespace are separated from the host and from other containers, so one application's dependencies can't silently conflict with another's. For example, two apps needing different, incompatible versions of the same library can run side by side, each in its own container, with no conflict.
- Efficiency relative to virtual machines — containers share the host's kernel rather than each running a full separate OS, making them dramatically lighter-weight to start and to run many of them side by side (see the VM comparison question for the full contrast).
- A standard packaging and distribution format — images can be pushed to and pulled from a registry (see that topic), giving teams a consistent way to share, version, and deploy applications.
The core mental model
Docker essentially answers: "how do I package an application so it carries its own environment with it, and run that package in a way that's isolated from everything else on the machine, without the overhead of a full virtual machine per application?" Every other Docker concept — images, layers, volumes, networks — exists in service of that core idea.
Related Resources
The image: a read-only template
docker build -t myapp:1.0 .
docker images
# REPOSITORY TAG IMAGE ID SIZE
# myapp 1.0 a1b2c3d4e5f6 180MB
An image is the packaged, immutable result of a build — a stack of filesystem layers (see the layer caching question) plus metadata describing how a container from it should run (its default command, exposed ports, environment defaults). An image itself is never "running" — it's inert, stored data, the same way a class definition or a compiled binary on disk isn't itself "executing."
The container: a running instance, with its own writable layer
docker run -d --name web1 myapp:1.0
docker run -d --name web2 myapp:1.0
docker ps
# CONTAINER ID IMAGE NAMES
# 7f8e9d0c1b2a myapp:1.0 web1
# 3c4d5e6f7a8b myapp:1.0 web2
Each docker run from the same image creates a genuinely separate, independent container — its own process namespace, its own network namespace and IP, and its own thin writable layer stacked on top of the image's read-only layers. Any file changes a container makes at runtime (writing a log file, a temp file) go into that container's own writable layer. This layer is completely invisible to, and independent of, any other container started from the same image. It is lost when that specific container is removed.
The class/instance analogy
Image ≈ a class definition (myapp:1.0 — describes what to run and how)
Container ≈ an instance of that class (web1, web2 — each independently running,
independently stateful, independently
destroyable, without affecting the others
or the original image)
Starting web2 doesn't consume or modify myapp:1.0 in any way — the image stays exactly as it was, ready to spawn any number of further independent containers.
What happens to a container's writable-layer data
docker stop web1
docker start web1 # any files written to web1's writable layer are still there
docker rm web1 # NOW that writable layer, and everything in it, is gone permanently
Stopping and restarting a container preserves its writable layer's contents. Removing a container discards it entirely. This is exactly why anything meant to persist beyond a single container's lifetime, such as real application data, belongs in a volume rather than the container's own writable layer (see the storage topic).
Why this distinction is foundational to everything else in Docker
Nearly every other Docker concept builds directly on this image/container split. Layer caching and multi-stage builds are about how images are constructed efficiently. Volumes and bind mounts exist specifically because a container's own writable layer is ephemeral and tied to that one container. Registries exist to distribute images (not containers), so that any number of independent containers can be started from the same shared, versioned artifact across many different machines.
Related Resources
The architectural difference
Virtual Machines Containers
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ App A │ │ App B │ │ App A │ │ App B │
│ Bins/Libs│ │ Bins/Libs│ │ Bins/Libs│ │ Bins/Libs│
│ Guest OS │ │ Guest OS │ └─────────┘ └─────────┘
│ (own │ │ (own │ ┌───────────────────────┐
│ kernel) │ │ kernel) │ │ Docker Engine │
└─────────┘ └─────────┘ ├───────────────────────┤
┌───────────────────────┐ │ Host OS (ONE kernel, │
│ Hypervisor │ │ shared by all │
├───────────────────────┤ │ containers) │
│ Host OS │ └───────────────────────┘
└───────────────────────┘
Virtual machines: virtualize hardware, run a full guest OS each
Each VM includes its own complete guest operating system with its own kernel, running atop a hypervisor (which itself virtualizes CPU, memory, disk, and network for each guest). This gives very strong isolation: a compromise inside one VM's guest kernel doesn't directly threaten another VM's kernel. But this comes at a real cost. Each VM's guest OS consumes its own chunk of memory and disk just to boot and run, and starting a VM typically takes tens of seconds to minutes (booting an entire operating system).
Containers: share the host's kernel, isolate at the process level
A container is, at its core, just a regular process on the host — isolated from other processes using Linux kernel features (namespaces for what it can see, cgroups for what resources it can use — see that question) rather than running its own separate kernel at all. This means containers start in milliseconds. There's no OS to boot: the kernel is already running, and a container is just a newly isolated process within it. Containers also have far lower memory/disk overhead per instance, since there's no duplicated guest-OS footprint for every single container.
The isolation tradeoff, stated plainly
| Virtual Machines | Containers | |
|---|---|---|
| Isolation boundary | Separate kernel per VM — very strong | Shared host kernel — weaker, process-level isolation |
| Startup time | Seconds to minutes (booting an OS) | Milliseconds (starting a process) |
| Resource overhead per instance | High (a full guest OS each) | Low (just the process and its isolated view) |
| Density (instances per host) | Lower | Much higher |
| Kernel vulnerabilities | Isolated to that VM's own kernel | Can, in principle, be exploited to escape container isolation and affect the shared host kernel/other containers |
Why this tradeoff matters in practice
Containers are the better fit when you need to run many instances of many different applications efficiently, with fast startup, and where the isolation the shared-kernel model provides is sufficient for your trust boundary (see the multi-tenancy discussion in the Kubernetes stack for when it isn't). VMs remain the right choice when you genuinely need the strongest possible isolation between workloads (e.g., running truly untrusted, mutually adversarial code, or needing entirely different kernels/operating systems side by side on the same hardware). The extra overhead is the price paid for a meaningfully stronger security boundary.
They aren't mutually exclusive
In practice, most container workloads run inside VMs anyway. A cloud provider's "bare metal" host running Docker is unusual. More commonly, Docker runs inside a cloud VM instance, which itself runs on a hypervisor shared with other tenants' VMs. This layered approach combines the VM's strong isolation between different customers/tenants at the infrastructure level with the container's lightweight, fast-starting isolation for individual applications within one tenant's own workloads.
Related Resources
The layered architecture
docker CLI ──(REST API)──▶ dockerd (Docker daemon)
│
▼
containerd (container lifecycle manager)
│
▼
runc (OCI runtime — creates the actual isolated process)
│
▼
Linux kernel (namespaces + cgroups)
Docker CLI — a thin client
docker run -d -p 8080:80 nginx
The docker command itself does almost no work directly — it constructs an HTTP request describing what you asked for and sends it to the Docker daemon's REST API (typically over a Unix socket, /var/run/docker.sock, or a TCP socket if configured for remote access). This is why remote Docker management tools, and Docker's own CLI running against a remote daemon, both work — the CLI is just one possible client of a well-defined API.
dockerd — the daemon, managing the bigger picture
The Docker daemon handles the higher-level concerns: building images, managing networks and volumes, handling the REST API, and enforcing Docker-level configuration. But it delegates the actual work of running a container to containerd, rather than doing it directly itself.
containerd — container lifecycle management
containerd is a separate, standalone component (donated to and now governed by the CNCF, the same foundation that hosts Kubernetes) responsible for the full container lifecycle: pulling images, managing storage, and supervising running containers. Notably, containerd itself has no CLI or user-facing API in the way docker does. It's designed to be used by a higher-level tool, such as dockerd, or directly by a Kubernetes node's kubelet via the CRI (see the Kubernetes stack's CRI question), rather than by an end user directly.
runc — the low-level OCI runtime
runc is the component that does the actual, final work of creating an isolated container process — setting up Linux namespaces (see that question), configuring cgroups, and then executing the container's process within that isolated environment. runc implements the OCI (Open Container Initiative) Runtime Specification (see that question). This is exactly why alternative low-level runtimes, like Kata Containers or gVisor's runtime, can be swapped in for stronger isolation without containerd or dockerd needing runtime-specific code for each one.
Why this many layers, rather than one monolithic tool
Each layer standardizes a different concern, allowing components above and below it to be swapped independently. containerd can be used directly by Kubernetes without needing dockerd at all (bypassing Docker entirely, which is exactly what happened when Kubernetes deprecated dockershim — see that stack's question). runc can be swapped for a stronger-isolation OCI-compliant runtime without containerd needing to change. This layered, standardized design is precisely why the broader container ecosystem (Docker, Kubernetes, Podman, and others) can share and interoperate around common lower-level components rather than each reimplementing container execution from scratch.
Practical relevance
When troubleshooting a Docker issue, understanding this chain tells you where to look. docker CLI errors about connecting to the daemon point at dockerd's availability. A container failing to actually start (versus the image failing to build) often points further down the stack toward containerd or runc-level issues, such as kernel feature availability or cgroup configuration. Understanding that containerd predates and outlives any specific Docker CLI experience also explains why the same container images and runtime concepts apply, whether you're using plain Docker or a Kubernetes cluster built on the same underlying containerd.
Related Resources
Namespaces — controlling what a process can see
A Linux namespace wraps a global system resource so that processes inside the namespace see their own isolated instance of it, while processes outside see the normal, unwrapped resource (or a different namespace's instance entirely).
| Namespace | What it isolates |
|---|---|
| PID | Process IDs — a container's process sees itself as PID 1, unaware of any other processes running on the host or in other containers |
| NET | Network interfaces, IP addresses, routing tables, ports — a container gets its own virtual network stack, distinct from the host's |
| MNT | Filesystem mount points — a container sees only its own filesystem view (its image's layers plus any mounted volumes), not the host's real filesystem |
| UTS | Hostname and domain name — a container can have its own hostname, independent of the host machine's |
| IPC | Inter-process communication resources (shared memory, semaphores) — prevents one container's IPC objects from being visible to or colliding with another's |
| USER | User and group IDs — lets a process be root inside the container's namespace while mapping to an unprivileged, non-root user on the actual host, reducing the impact of a container escape |
# Inside a container, the container's own main process appears as PID 1
docker exec my-container ps aux
# PID USER COMMAND
# 1 root node server.js <- this process's REAL host PID might be, say, 48213
This is why a container "sees" only its own processes, its own network configuration, and its own filesystem. This gives the strong illusion of running on a dedicated machine, even though it's really just an ordinarily-scheduled process on the shared host, viewed through a namespace-restricted lens.
cgroups — controlling what a process can use
Namespaces control visibility. cgroups control resource consumption: how much CPU, memory, disk I/O, and network bandwidth a process (or group of processes) is allowed to use. cgroups also provide accounting and metrics for actual usage.
docker run --memory="512m" --cpus="1.5" myapp:1.0
This translates directly into cgroup configuration. The kernel's cgroup subsystem enforces that this container's processes can never allocate more than 512MB of memory (triggering an OOM kill if exceeded — the same underlying mechanism covered in the Kubernetes stack's OOMKilled question). It also caps the container at 1.5 CPU cores' worth of scheduling time, regardless of how much the host machine actually has available.
Why both are needed together
Namespaces alone would let a container see only itself. But without cgroups, nothing would stop that container from consuming all of the host's CPU or memory, starving every other container sharing the machine. Isolation of view without control of consumption isn't enough for a genuinely multi-tenant host. Conversely, cgroups alone (limiting resource usage) without namespaces would still let one container's processes see and potentially interfere with every other process on the host. Together, namespaces provide the illusion of a dedicated machine, and cgroups provide the guarantee that one tenant can't monopolize the shared machine's real resources.
Why this matters beyond trivia
Understanding that "a container" is really just an ordinary Linux process — made to look isolated via namespaces and made resource-bounded via cgroups, not some fundamentally different kind of virtualized entity — explains a lot of otherwise-surprising container behavior. It explains why docker top can show container processes' real host PIDs. It explains why a container "escape" vulnerability is fundamentally about breaking out of namespace/cgroup confinement rather than "hacking a virtual machine." And it explains why Kubernetes's resource requests/limits (see that stack) map directly onto these exact same underlying cgroup mechanisms.