Storage and Volumes

Difficulty

Volumes — Docker-managed, the recommended default

docker volume create my-data
docker run -d -v my-data:/app/data myapp:1.0

Docker creates and manages the actual storage location on the host's filesystem (typically under /var/lib/docker/volumes/), completely abstracted away from you. You reference it by name (my-data), not by a specific host path. Volumes are the recommended mechanism for persisting any real application data (a database's files, uploaded content). This is precisely because Docker manages their lifecycle, backup tooling, and driver options consistently, independent of the host's own directory structure.

Bind mounts — an arbitrary host path, mapped directly in

docker run -d -v /home/user/my-config:/app/config myapp:1.0
# or, using the more explicit --mount syntax:
docker run -d --mount type=bind,source=/home/user/my-config,target=/app/config myapp:1.0

Maps a specific, existing path on the host directly into the container. This gives full transparency and control over exactly where the data lives on the host's own filesystem, which is genuinely useful for specific scenarios: mounting your local source code directory into a container for live-reload development, or sharing a specific existing host directory. The tradeoff is that the container's behavior now depends on that exact host path existing and having the right permissions and content. This tightly couples the container to that specific host's directory layout, in a way that hurts portability across different machines and environments.

tmpfs mounts — memory-only, never touches disk

docker run -d --tmpfs /app/cache myapp:1.0

Data written here lives entirely in the host's RAM. This is extremely fast, but it is completely lost the moment the container stops. This isn't just on removal — even a docker stop/docker start cycle loses it, unlike a volume or bind mount. This is useful for genuinely temporary data that is sensitive to persist: a cache that's fine to lose, or explicitly avoiding writing sensitive temporary data (like a decrypted secret) to disk at all, even transiently.

Side-by-side comparison

VolumeBind mounttmpfs
Managed byDockerYou (an arbitrary host path)Docker (in-memory only)
Survives container removalYesYes (it's just a host directory)No — gone even on container stop
Portable across different hostsYes (referenced by name, not host path)No (tied to that host's specific path)N/A (never persists anywhere)
Typical useReal application/database dataLocal development (mounting source code), sharing a specific existing host resourceTemporary, sensitive, or performance-critical scratch data

Why volumes are generally preferred over bind mounts in production

Recall that a bind mount ties the container's correct behavior to the specific host's directory structure and permissions. This is exactly the kind of environment-dependent coupling containers are meant to eliminate (see the fundamentals topic's "what problem does Docker solve" question). A volume, referenced purely by name, works identically regardless of which host it's actually running on, or how that host's filesystem happens to be laid out. This is a meaningfully better fit for production deployments, where you want the same container configuration to behave identically across different machines.

Related Resources

Portability across hosts and environments

# Bind mount: hardcodes a specific host path -- this exact path must exist,
# with correct permissions, on EVERY machine this container might ever run on
docker run -v /opt/myapp/data:/app/data myapp

# Named volume: portable -- Docker manages where it actually lives,
# and the SAME command works identically regardless of host layout
docker run -v app-data:/app/data myapp

A bind mount's correctness depends on the assumption that /opt/myapp/data exists, with the right permissions, on whatever host this container happens to run on. That assumption breaks the moment you deploy to a different server, a different developer's laptop, or a freshly provisioned machine without that exact directory structure already set up. A named volume has no such dependency. Docker creates and manages it consistently regardless of the host's own directory layout. This is precisely the same portability guarantee containers are meant to provide for application code in the first place (see the fundamentals topic).

Consistent tooling and lifecycle management

docker volume ls
docker volume inspect app-data
docker volume prune            # clean up unused volumes

Docker's own CLI and API provide first-class commands for listing, inspecting, and cleaning up volumes. A bind mount, being just an arbitrary host directory, isn't tracked or managed by Docker at all in the same way. You must figure out yourself what host directories are actually being used by which containers. Cleaning them up requires ordinary filesystem tools, rather than Docker's own consistent volume-management commands.

Volume drivers extend capability without changing application configuration

docker volume create --driver local --opt type=nfs --opt device=:/exported/path --opt o=addr=nfs-server.example.com my-nfs-volume

Named volumes support pluggable volume drivers (see that question). This lets the same -v my-nfs-volume:/app/data reference in a container's configuration be backed by local disk, NFS, a cloud storage service, or another storage backend entirely. The underlying storage implementation can be swapped without touching the container's own configuration at all. A bind mount, by definition, is always tied to whatever's actually at that literal host path. There is no equivalent abstraction layer to swap the backing storage transparently.

Permission and ownership complications specific to bind mounts

Bind mounts frequently run into UID/GID mismatch issues. A container process running as a specific user ID needs to actually have permission to read and write the bound host directory. Host-side and container-side user ID mappings don't always align cleanly, especially across different host operating systems, or when a container's internal user doesn't correspond to any real user on the host. Named volumes, being fully managed by Docker, avoid much of this complexity, since Docker handles the underlying storage directly rather than requiring alignment with an arbitrary host directory's existing ownership.

When bind mounts are still the right, deliberate choice

  • Local development — live-mounting your actual source code directory into a container so code changes are immediately reflected without rebuilding the image, a very common and appropriate development workflow.
  • Deliberately sharing a specific, known host resource — e.g., mounting /etc/localtime read-only to sync a container's timezone with the host's, or mounting a Unix socket like the Docker socket itself (with the security caveats covered in that topic's question).

Reaching for a bind mount purely out of habit, rather than for one of these deliberate reasons, is usually a sign the default should have been a named volume instead.

Related Resources

Backing up a volume

docker run --rm \
  -v my-app-data:/source:ro \
  -v $(pwd):/backup \
  alpine \
  tar czf /backup/my-app-data-backup.tar.gz -C /source .

Breaking this down:

  • -v my-app-data:/source:ro — mounts the volume you want to back up, read-only (:ro), into a temporary container, so the backup process can't accidentally modify the live data while reading it.
  • -v $(pwd):/backup — a bind mount, giving the temporary container access to your current host directory, so the resulting archive ends up somewhere you can actually access it afterward (outside the ephemeral container).
  • alpine — a minimal, throwaway image, chosen purely because it includes tar and almost nothing else needed for this one-off task.
  • tar czf /backup/my-app-data-backup.tar.gz -C /source . — the actual backup command, compressing everything in /source (the mounted volume) into a single archive file, written into /backup (the bind-mounted host directory).
  • --rm — automatically removes this temporary container once the command finishes, since it has no ongoing purpose beyond this one backup operation.

Restoring from a backup

docker volume create my-app-data-restored

docker run --rm \
  -v my-app-data-restored:/target \
  -v $(pwd):/backup \
  alpine \
  tar xzf /backup/my-app-data-backup.tar.gz -C /target

The reverse process: create a fresh (or existing, if genuinely restoring in place) volume, then mount it as the extraction target. Unpack the previously created archive into it via the same kind of temporary, throwaway container.

Why this pattern works: volumes aren't tied to any specific "owning" container

The key insight making this whole technique possible is that a named volume isn't permanently bound to whichever container originally used it. Any container can mount it, including a completely unrelated, temporary one whose only job is to run a backup/restore command. This is exactly the same underlying property that makes volumes useful for migrating data between different application versions, or even entirely different applications. It works as long as both sides agree on the expected data format inside the volume.

Database-specific backup tools are usually still the better choice for real databases

docker exec my-postgres pg_dump -U postgres mydb > backup.sql

For an actual running database, the database's own native backup tooling (pg_dump, mysqldump, and equivalents) is generally a better approach than a raw filesystem-level tar of the volume. A live database's on-disk files can be in an inconsistent, mid-write state if archived directly while the database is running, unless it's stopped first, or the tool specifically supports safe hot-backup snapshotting. A proper database dump tool, by contrast, guarantees a consistent, valid backup by working through the database's own transactional guarantees, rather than copying raw files.

Automating this as a scheduled task

# A cron job, or a scheduled CI/CD pipeline step, running the backup command
# above on a regular schedule, pushing the resulting archive to durable,
# off-host storage (cloud object storage, a dedicated backup server) --
# never leaving backups only on the SAME host as the live data.

This mirrors the same principle covered in the SQL/Databases and Kubernetes stacks' backup questions. Backups must be automated on a regular schedule, stored somewhere genuinely separate from the live data (so a single host failure can't destroy both simultaneously), and periodically tested by actually performing a restore. An untested backup isn't a real backup, regardless of which specific technology or command produced it.

The demonstration

docker run -d --name my-db postgres:16    # no volume mounted -- data lives ONLY in the writable layer
docker exec my-db psql -U postgres -c "CREATE TABLE important_data (...);"
# ... insert critical data ...

docker rm -f my-db
docker run -d --name my-db postgres:16     # a FRESH container, from the same image
docker exec my-db psql -U postgres -c "SELECT * FROM important_data;"
# ERROR: relation "important_data" does not exist

The second container is entirely new. It starts from the image's original, unmodified layers, with a fresh, empty writable layer. Every change made to the first container (the new table, its data) lived exclusively in that specific container's now-deleted writable layer. That data is gone permanently, with no relationship at all to the second container, even though both were started from the identical image.

Why this is expected, correct behavior — not a bug

Recall from the fundamentals topic: an image is an immutable, read-only template, and each container gets its own independent writable layer on top of it via copy-on-write (see that question). This is precisely what allows many containers to be started from the same image simultaneously, each with fully independent state. But it also means a container's writable layer is fundamentally tied to that one container's lifetime. It is not tied to the image, and it is not shared with any other container.

The fix: mount a volume for anything that needs to survive

docker volume create db-data
docker run -d --name my-db -v db-data:/var/lib/postgresql/data postgres:16

Now the database's actual data files live in the named volume db-data, not in the container's own writable layer. Removing this container and starting a fresh one picks up exactly where the previous container left off, as long as it mounts the same volume. This works since the volume's data is independent of any specific container's lifecycle:

docker rm -f my-db
docker run -d --name my-db -v db-data:/var/lib/postgresql/data postgres:16
docker exec my-db psql -U postgres -c "SELECT * FROM important_data;"
# the data is still there -- it was never IN the removed container's writable layer at all

The mental model this reinforces

Think of the writable layer as entirely disposable, scratch space specific to one container instance, and volumes as the only place genuinely persistent data should live. Any file written outside a mounted volume path should be treated as something you're comfortable losing the instant that specific container is removed. Logs (which should generally go to stdout/stderr and be captured by Docker's logging driver instead; see the lifecycle topic), temporary caches, and anything else genuinely ephemeral are fine to leave in the writable layer. Real application data, database files, and uploaded content are not.

A common real-world mistake this explains

A surprisingly common incident pattern looks like this: a database or application was run without a mounted volume during initial setup, perhaps for a "quick test" that then quietly became the actual production deployment. Months of accumulated data are then permanently lost the first time that one specific container happens to be removed or replaced, whether during a routine update, a host migration, or simple operator error. This happens precisely because nothing was ever actually persisted outside that one container's own writable layer. Verifying that every stateful container mounts an appropriate volume for its real data, rather than just assuming it does, is a basic, essential production readiness check.

Related Resources

The default: the local driver

docker volume create my-data
docker volume inspect my-data
# "Driver": "local"
# "Mountpoint": "/var/lib/docker/volumes/my-data/_data"

Without specifying a driver, Docker uses the built-in local driver, which simply creates and manages a directory on the host machine's own disk. This is fine for single-host setups, but it means the volume's data is physically tied to that one specific host — if the container needs to move to a different machine, the volume (and its data) doesn't automatically come along.

Using an alternative volume driver

docker volume create --driver local \
  --opt type=nfs \
  --opt o=addr=192.168.1.100,rw \
  --opt device=:/exported/path \
  my-nfs-volume

docker run -d -v my-nfs-volume:/app/data myapp:1.0

This example uses the local driver's own built-in NFS mount option support, backing the "volume" with a remote NFS share instead of purely local disk. The container's own configuration (-v my-nfs-volume:/app/data) looks identical to using a plain local volume. Only the volume's own creation-time definition differs.

Third-party volume driver plugins extend this further, supporting cloud block storage services, distributed storage systems (Ceph, GlusterFS), and other backends. Each implements Docker's volume plugin API, so that, from the container's perspective, using them requires no different syntax than using any other named volume.

Why this abstraction matters

Container's perspective:  -v my-data:/app/data   (identical, regardless of backend)

Actual backend, depending on the driver used:
  - local disk (default "local" driver)
  - NFS share
  - Cloud block storage (via a cloud-specific driver)
  - A distributed storage system

This mirrors exactly the same abstraction philosophy behind Kubernetes's StorageClasses and the CSI (Container Storage Interface; see that stack's question). Application/container configuration references storage in an abstract, backend-agnostic way, while a pluggable driver layer handles the actual, potentially very different, underlying implementation. The benefit is the same in both ecosystems: you can change or upgrade the underlying storage infrastructure without needing to rewrite every container's configuration that references it.

When you'd actually reach for a non-default driver

  • Multi-host setups without a full orchestrator — if you're running plain Docker (not Swarm or Kubernetes) across multiple hosts, and need a container's data to be accessible regardless of which specific host it happens to run on, a network-backed volume driver (NFS, a distributed storage plugin) solves this. The default local driver's host-tied storage would not.
  • Cloud-native storage integration — using a cloud provider's own volume driver plugin to back Docker volumes directly with that provider's managed block/file storage service, gaining that service's own durability, snapshotting, and replication features.

For simple, single-host Docker deployments, the default local driver remains entirely sufficient and requires no extra configuration at all.

Related Resources