What logging drivers does Docker support, and how do they affect log management?

This mirrors the earlier lifecycle-topic logging question, applied specifically to production concerns: the default `json-file` driver writes logs to local disk with no automatic rotation configured out of the box, which doesn't scale well across many containers/hosts or survive a host being replaced. Production deployments typically switch to a driver that forwards logs directly to a centralized system (`syslog`, `journald`, `fluentd`, `gelf`, `awslogs`, and others), trading `docker logs` CLI access for centralized, durable, searchable logging across an entire fleet.

How do you use Docker in a CI/CD pipeline to build and test images?

A typical pipeline builds the image from the Dockerfile (often leveraging cached layers from a previous build to speed this up), runs the application's test suite inside a container built from that same image (or an intermediate build stage) to ensure tests run in an environment matching production, tags the image (often with the git commit SHA for traceability), pushes it to a registry, and — for genuine reproducibility — records/pins the resulting digest for use by the actual deployment step.

What is Docker-in-Docker (DinD), and what are its risks and alternatives?

Docker-in-Docker means running a full, separate Docker daemon *inside* a container — typically needed when a CI pipeline itself runs inside a container but needs to build and run other Docker images as part of its job. It carries real risks (needing to run the outer container in `--privileged` mode, layered storage-driver complications, and generally weaker isolation than expected), which is why the safer alternatives — mounting the host's Docker socket (with its own distinct risks — see the security topic), or using a purpose-built rootless image-building tool like Kaniko or Buildah — are often preferred instead.

What's the difference between Docker Swarm and Kubernetes for orchestration?

Docker Swarm is Docker's own built-in orchestration mode — simpler to set up and operate, using concepts and CLI syntax that map closely onto familiar plain-Docker commands, but with a meaningfully smaller feature set and ecosystem than Kubernetes. Kubernetes is the dominant, far more feature-rich orchestrator (covered in its own dedicated stack), with a steeper learning curve but vastly broader ecosystem, tooling, and cloud-provider support. This exact comparison is covered in more depth in the Kubernetes stack's own equivalent question — Swarm's real-world adoption and community investment have declined substantially as the industry has consolidated around Kubernetes.

How do you handle configuration differences across environments with the same image?

Build the image **once**, and inject environment-specific configuration at runtime — via environment variables, mounted config files, or secrets — rather than baking environment-specific values into the image or building a separate image per environment. This follows the twelve-factor app principle of strict separation between build and run stages, ensuring the exact same, already-tested artifact is what actually gets promoted from staging to production, rather than subtly different images per environment that could behave differently for reasons unrelated to the environment configuration itself.

How do you troubleshoot a container that exits immediately after starting?

Check `docker logs ` first — the application likely printed an error explaining exactly why it exited before you even need to investigate further. Check the exit code via `docker inspect` or `docker ps -a` (a clean exit code 0 suggests the main process simply finished and returned, often meaning the `CMD`/`ENTRYPOINT` isn't actually a long-running process; a non-zero code suggests an application-level error). If logs are unhelpfully empty, try running the container interactively with an overridden command (`docker run -it myapp sh`) to poke around before the normal command would run.

What is the twelve-factor app methodology's relevance to containerized applications?

The twelve-factor app methodology is a set of principles for building software-as-a-service applications that are portable, scalable, and consistent across environments — and it maps almost perfectly onto how containers are meant to be used: strict separation of config from code (see that question), treating backing services as attached resources, logging to stdout rather than managing log files internally, and explicit process isolation. Understanding these principles explains *why* several Docker best practices (log to stdout, never bake config into the image, keep containers stateless/disposable) are considered best practices in the first place, rather than being arbitrary conventions.

Docker in Production and CI/CD

Logging drivers, build pipelines, Docker-in-Docker, orchestration choices, and troubleshooting production containers.

Difficulty

Open as page

Why the default driver doesn't scale to real production needs

The default json-file driver (see the lifecycle topic) writes each container's logs to a local file on that specific host's disk, with no automatic rotation configured unless you explicitly set --log-opt max-size/max-file. Across a fleet of many hosts running many containers, this creates several problems. Logs are scattered across many machines with no unified way to search across all of them. A host being replaced (common in autoscaled or ephemeral infrastructure) takes its logs with it. Unbounded log growth can genuinely fill a host's disk if rotation isn't explicitly configured.

Setting a logging driver

docker run --log-driver=syslog --log-opt syslog-address=udp://loghost:514 myapp
docker run --log-driver=fluentd --log-opt fluentd-address=localhost:24224 myapp
docker run --log-driver=awslogs --log-opt awslogs-group=myapp --log-opt awslogs-region=us-east-1 myapp

Or, to set a default for the whole Docker daemon (rather than specifying it per-container):

// /etc/docker/daemon.json
{
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  }
}

Common production-oriented drivers

syslog — forwards to a syslog server, a long-established standard for centralized Unix/Linux logging.
journald — integrates with systemd's journal on hosts using systemd, useful if the rest of the host's own logging already goes through journald.
fluentd / gelf — forward to Fluentd or a GELF-compatible endpoint (like Graylog). These are common choices for feeding into an Elasticsearch/Loki-based centralized logging stack, the same kind of architecture covered in the Kubernetes stack's logging question. Here, Docker's own driver mechanism feeds the stack directly, rather than a separate log-shipping DaemonSet.
awslogs — forwards directly to AWS CloudWatch Logs, a natural fit when already running on AWS infrastructure.

The tradeoff: docker logs stops working locally

docker run --log-driver=awslogs myapp
docker logs myapp
# Error response from daemon: configured logging driver does not support reading

Once a non-default driver is configured, docker logs generally can no longer read the container's log output locally at all. Logs are only accessible through whatever external system the chosen driver forwards to. This is an important, sometimes-surprising tradeoff for teams used to reaching for docker logs directly during troubleshooting. The mental model needs to shift to "check the centralized logging system," not "SSH into the host and run docker logs."

Why this matters for anything beyond a single-host deployment

Centralized logging isn't optional once you're running more than a handful of containers across more than one host. Without it, diagnosing an issue that spans multiple services (a request that touches three different containers, possibly on three different hosts) requires manually checking logs on each machine individually. This becomes impractical fast. A centralized logging driver, feeding into a searchable system, is what makes cross-service, cross-host troubleshooting actually tractable at any real scale. It only needs to be configured once at the daemon level, so every container benefits without per-container setup.

Related Resources

Docker: Configure logging drivers

Open as page

A representative pipeline sequence

# Simplified CI pipeline concept
steps:
  - name: Build image
    run: docker build -t myapp:${{ github.sha }} .

  - name: Run tests inside a container
    run: docker run --rm myapp:${{ github.sha }} npm test

  - name: Scan for vulnerabilities
    run: trivy image --exit-code 1 --severity CRITICAL myapp:${{ github.sha }}

  - name: Push to registry
    run: |
      docker tag myapp:${{ github.sha }} myregistry.example.com/myapp:${{ github.sha }}
      docker push myregistry.example.com/myapp:${{ github.sha }}

Why building the image is often the very first step, before tests even run

Building the actual production image first, then running tests inside a container built from that image, ensures the tests genuinely validate the same environment that will actually run in production. This is different from running tests directly on the CI runner's own environment, separately from the image build. Doing it this way closes the exact "works on my machine/CI, breaks in production" gap that containers exist to solve in the first place (see the fundamentals topic). Running tests against a different environment than what's actually shipped defeats much of the purpose of containerizing the application at all.

Tagging with the commit SHA for traceability

docker build -t myapp:${{ github.sha }} .

Tagging each CI-built image with the specific git commit SHA that produced it (rather than only a generic version tag like latest or even 1.0) gives an unambiguous, traceable link between a specific running image and the exact source code that built it. This is essential for debugging ("which commit is actually deployed right now") and for the digest-pinning practices covered in the registries topic.

Leveraging build cache across CI runs

- name: Build with registry cache
  run: |
    docker build \
      --cache-from myregistry.example.com/myapp:latest \
      -t myapp:${{ github.sha }} .

A fresh CI runner typically starts with no local build cache at all, unlike a developer's own machine, which accumulates cache across many local builds. Without addressing this, every single CI build is effectively a full, uncached rebuild, however well the Dockerfile itself is ordered for caching (see that question). Two techniques let CI builds benefit from layer caching despite starting from a clean runner environment each time: pulling a previous build's image as an explicit cache source (--cache-from), or using BuildKit's remote cache export/import capability (see that question).

Multi-stage builds work especially well in CI

FROM node:20 AS test
WORKDIR /app
COPY . .
RUN npm ci && npm test

FROM node:20-slim AS production
WORKDIR /app
COPY --from=test /app/dist ./dist
CMD ["node", "dist/server.js"]

A dedicated test stage can run the full test suite (with all dev dependencies, test frameworks, etc.) while the final production stage only copies out the built artifacts. This combines the "test in the real build environment" benefit with the "final shipped image stays minimal" benefit from the multi-stage builds question, in one Dockerfile.

Security scanning as a CI gate

As covered in the registries topic's vulnerability-scanning question, integrating a scan step that can fail the build on critical/high findings prevents a genuinely dangerous image from ever reaching a registry or a deployment target. This catches the issue as early in the pipeline as practical.

Related Resources

Docker: CI/CD Best Practices

Open as page

Why this scenario comes up at all

Many CI/CD systems run each job inside its own container for isolation and reproducibility — but if that job's own work is to build a Docker image (a very common CI task), you end up needing to run Docker inside the container the CI job itself is running in. This is the scenario Docker-in-Docker addresses.

The DinD approach — a real, nested Docker daemon

docker run --privileged -d --name dind docker:24-dind
docker run --link dind:docker --env DOCKER_HOST=tcp://docker:2375 docker:24 docker build .

This runs an entirely separate Docker daemon inside a container, and a second container talks to that nested daemon to actually perform builds — genuinely running "Docker inside Docker," not just talking to the host's existing daemon.

The real risks of this approach

Requires --privileged mode — running a nested Docker daemon generally requires disabling most container isolation protections for the outer container. --privileged grants nearly all capabilities and disables several security restrictions covered in the security topic. This is a significant security relaxation, not a minor detail, and it directly undermines much of the isolation benefit containers are meant to provide in the first place.
Storage-driver complications — running a container filesystem (an overlay/union filesystem; see the fundamentals topic) inside another container's own overlay filesystem has historically caused genuine compatibility and performance issues. This happens because it layers the same kind of filesystem trickery on top of itself.
Weaker isolation than the "in Docker" framing suggests — despite feeling like it should be "extra isolated" (Docker inside Docker), the --privileged requirement actually means the outer container has less isolation from the host. An ordinary, non-privileged container would have more isolation than this setup provides.

Alternative 1: mounting the host's Docker socket

docker run -v /var/run/docker.sock:/var/run/docker.sock docker:24 docker build .

This avoids running a nested daemon at all — instead, the CI container talks directly to the host's own Docker daemon (see the security topic's Docker-socket question). This avoids DinD's storage-driver and --privileged concerns, but it introduces its own well-documented, serious risk. As covered in that question, socket access is functionally equivalent to host root. Any CI job with this mount has, in effect, host-level access, which is a serious concern for CI systems running untrusted or third-party pipeline code.

Alternative 2: purpose-built rootless image-building tools

# Kaniko, running inside a Kubernetes Pod with no special privileges,
# building an image without ever needing a Docker daemon (nested or host) at all

Tools like Kaniko (built by Google, commonly used in Kubernetes-based CI) and Buildah can build OCI-compliant images without requiring a Docker daemon at all. They implement the image-building logic directly in user space, without needing privileged access or a socket to any daemon. This genuinely avoids both of the above risks, rather than merely mitigating them. This is increasingly the preferred approach specifically for CI systems (especially Kubernetes-based ones) that need to build images as an ordinary, unprivileged step in an otherwise-sandboxed pipeline.

Weighing the tradeoffs

Approach	Privilege required	Risk profile
Docker-in-Docker (nested daemon)	`--privileged`	Significant isolation weakening; storage-driver quirks
Mounted host socket	None on the container itself, but socket access = host root	Serious, well-documented escalation risk
Kaniko / Buildah (daemonless)	None	Avoids both risks above entirely

Where DinD or socket-mounting genuinely can't be avoided (some legacy pipeline setups, specific tooling requirements), the right mental model is to treat that CI runner as a fully trusted, high-privilege environment. Restrict what pipelines are allowed to run in it accordingly. Do not treat it as just another routine, low-stakes CI job.

Related Resources

Docker: Docker-in-Docker

Open as page

Docker Swarm — Docker's own, simpler built-in orchestrator

docker swarm init                                    # initialize a Swarm on this node
docker service create --name web --replicas 3 -p 80:80 nginx   # deploy a replicated service across the Swarm

Swarm mode turns a group of Docker hosts into a cluster, using concepts (services, tasks, overlay networks; see the networking topic's overlay question) that closely mirror plain Docker's own CLI and mental model. This closeness is Swarm's biggest advantage. Someone already comfortable with plain docker run/docker-compose concepts can pick up Swarm with relatively little additional learning. This is a much smaller learning curve compared to Kubernetes's much larger and more distinct set of concepts (Pods, Deployments, Services, ConfigMaps, RBAC, and dozens more; see that stack).

Kubernetes — the dominant, far more feature-rich orchestrator

Covered extensively in its own dedicated stack, Kubernetes provides sophisticated scheduling (affinity, taints/tolerations, priority/preemption), rich networking options (Ingress, NetworkPolicies, multiple CNI choices), and a vast ecosystem of extensions (CRDs, Operators, Helm charts for essentially any popular software). It is also what every major cloud provider offers a managed service for.

The key practical tradeoffs

	Docker Swarm	Kubernetes
Learning curve	Gentle (builds directly on Docker concepts)	Steep (many distinct concepts)
Feature richness	Basic (replicas, overlay networking, rolling updates)	Extensive (see that stack's many topics)
Ecosystem/tooling	Small, and has been shrinking	Enormous, still growing
Managed cloud offerings	Minimal	Extensive (EKS, GKE, AKS, and more)
Current industry momentum	Declining	Dominant

Why this comparison matters for an interview, even though the answer leans clearly toward Kubernetes today

Recommending Swarm for a brand-new production system today, given the industry's clear consolidation around Kubernetes, would be an unusual choice requiring strong specific justification. That justification might be a very small team, a very simple deployment need, and a strong existing preference for staying entirely within familiar plain-Docker concepts rather than adopting Kubernetes's larger surface area. A candidate should be able to articulate this landscape honestly. This means acknowledging Swarm's genuine, real simplicity advantage while recognizing that the ecosystem has broadly moved on. It also means not dismissing Swarm as having no merit at all, and not recommending it without appropriately weighing the tradeoff against Kubernetes's now-dominant position.

When Swarm might still be a reasonable, deliberate choice

A small team wanting basic multi-host orchestration (replicas, rolling updates, service discovery) without taking on Kubernetes's much larger learning curve and operational surface area.
An organization already deeply invested in plain Docker/Compose workflows, looking for the smallest possible step up to multi-host capability, rather than a much bigger architectural leap to Kubernetes.

Related Resources

Docker: Swarm mode overview

Open as page

The anti-pattern: building a separate image per environment

# BAD: separate builds per environment, baking in environment-specific config
docker build -t myapp:staging --build-arg API_URL=https://staging-api.example.com .
docker build -t myapp:production --build-arg API_URL=https://api.example.com .

This means myapp:staging and myapp:production are, strictly speaking, different artifacts. Even if the only intended difference is a configuration value, nothing guarantees the build process itself produced byte-for-byte identical images apart from that one value. A subtle build-time issue (a flaky dependency resolution, a build tool behaving slightly differently) could introduce an unintended difference between what was tested in staging and what actually ships to production. This directly undermines the confidence that "what we tested is exactly what we're deploying."

The correct pattern: build once, configure at runtime

docker build -t myapp:1.0 .          # ONE build, used everywhere

docker run -e API_URL=https://staging-api.example.com myapp:1.0        # staging
docker run -e API_URL=https://api.example.com myapp:1.0                 # production

The exact same image, byte-for-byte, is what runs in every environment. Only the runtime configuration (environment variables, mounted config files, secrets) differs. This is precisely the "build once, deploy many times, unchanged" principle covered throughout this stack (see the fundamentals topic and the tags/digests question). It means that if something works correctly in staging, you have real, direct confidence that the identical artifact will behave the same way in production, since nothing about the image itself changed between the two.

The twelve-factor app's "config" principle

This directly reflects Factor III (Config) of the twelve-factor app methodology. It calls for strict separation between an application's code (which should be identical across environments) and its configuration (which legitimately varies by environment). Configuration belongs in the environment (environment variables, mounted files), never hardcoded into the build artifact itself.

# Kubernetes ConfigMaps/Secrets (see that stack), or Compose environment/.env
# files, or a cloud platform's own environment-variable configuration --
# all apply this same principle at whatever layer is actually deploying the container

This principle is exactly why Kubernetes ConfigMaps/Secrets (see that stack) and Compose's environment/env_file mechanisms (see that topic) both exist as first-class concepts. They are the standard, orchestrator-level tools for injecting environment-specific configuration into an unchanged, promoted image, rather than requiring separate builds.

What this means for CI/CD pipeline design

1. Build the image ONCE, from a specific commit, tagged with that commit's SHA
2. Run tests against THAT SAME image
3. Push it to a registry
4. Deploy that SAME image (by digest, ideally -- see that question) to staging,
   with staging-specific configuration injected at deploy time
5. After validation, promote the SAME image (same digest) to production,
   with production-specific configuration injected at deploy time

This "build once, promote the same artifact through environments" pattern is a core CI/CD design principle. It eliminates an entire class of "it worked in staging but broke in production" bugs. Those bugs stem from staging and production having actually run subtly different artifacts, rather than the same one with different configuration.

Related Resources

The Twelve-Factor App: Config

Open as page

Step 1: check the logs — the fastest, most direct signal

docker logs my-container

The application itself very often printed a clear, direct explanation of what went wrong before exiting — a missing environment variable, a failed database connection, a syntax error, a missing file. This should always be the very first thing checked, before any deeper investigation.

Step 2: check the exit code

docker ps -a
# STATUS: Exited (0) 2 seconds ago     <- clean exit
# or:
# STATUS: Exited (1) 2 seconds ago      <- error exit

docker inspect my-container --format='{{.State.ExitCode}}'

Exit code 0 (clean exit) — often means the container's main process simply ran to completion and returned normally. For a container that's supposed to be a long-running server, this frequently points at a fundamental misunderstanding of what the CMD/ENTRYPOINT actually runs. For example, it could mean accidentally running a one-shot setup script instead of the actual long-running server process, or using a shell script that doesn't end with a command that blocks or keeps running.
A non-zero exit code — indicates an actual application-level error. The specific code sometimes maps to a recognizable meaning: 137 (128+SIGKILL) is often an OOMKill (see the security/lifecycle topics), and 1 is a generic catch-all application error in most conventions. But the logs (Step 1) usually tell you much more directly than the number alone.

Step 3: if logs are empty or unhelpful, run interactively with an overridden command

docker run -it --entrypoint sh myapp:1.0

Overriding the ENTRYPOINT with an interactive shell lets you explore the image's filesystem before the normal startup command would even run. This is useful for checking that expected files or configuration are actually present, testing whether a command that's supposed to run actually executes correctly when invoked manually, or generally investigating an environment where the real command fails too fast or too silently to diagnose any other way.

Common specific root causes

Missing or incorrect environment variables the application requires at startup, causing it to fail an early validation check and exit immediately — often with a helpful error message in the logs if the application validates configuration properly, or a much less helpful generic crash if it doesn't.
A missing dependency file — a config file, certificate, or other resource expected at a specific path that wasn't actually included in the image or mounted correctly.
Incorrect CMD/ENTRYPOINT — pointing at a script or binary that doesn't exist at that path inside the image, or has the wrong permissions (not marked executable).
The application genuinely isn't meant to be long-running — e.g., accidentally treating a one-shot script/tool's image as if it should run as a persistent service.
A crash during the application's own startup sequence — an unhandled exception during initialization. This is often the most common real cause, and it is exactly what Step 1's log check is meant to surface directly.

Verifying the CMD/ENTRYPOINT is actually what you expect

docker inspect myapp:1.0 --format='{{.Config.Cmd}} {{.Config.Entrypoint}}'

Occasionally the actual configured command differs from what you expect. This can happen when a base image's own ENTRYPOINT unexpectedly combines with your own CMD in a way you didn't intend (see the ENTRYPOINT/CMD question). Confirming exactly what's configured to run is a useful check before assuming the problem lies elsewhere. Working through logs, then exit code, then an interactive override, in that order, resolves the large majority of these issues without needing to guess at more exotic causes first.

Related Resources

Docker: Troubleshoot

Open as page

Why a pre-container methodology maps so well onto containers

The twelve-factor app methodology was originally written (by engineers at Heroku, around 2011) describing principles for building cloud-native SaaS applications. It predates Docker's mainstream adoption, but it describes exactly the properties that make an application a good fit for the containerized, orchestrated deployment model that later became standard. Several of its factors directly explain why specific Docker/Kubernetes conventions exist, rather than those conventions being arbitrary.

Factor III: Config — strictly separate from code

Covered in depth in the earlier environment-configuration question: configuration that varies by deployment (database URLs, feature flags, credentials) belongs in the environment, never hardcoded into the build artifact. This is precisely why Docker images should be built once and configured at runtime via environment variables or mounted files (see that question). It is also exactly why Kubernetes ConfigMaps/Secrets and Compose's environment mechanisms exist as first-class concepts.

Factor VI: Processes — stateless and share-nothing

An application should treat any locally-stored state as disposable. This directly explains why a container's own writable layer is treated as ephemeral scratch space (see the storage topic). It also explains why genuinely persistent data must live in an external store (a database, a mounted volume, or an external service), rather than being assumed to survive within the container's own filesystem indefinitely.

Factor XI: Logs — treat logs as event streams, write to stdout

console.log('Request received', { path: req.path });   // stdout -- NOT writing to a local log file

The twelve-factor principle is that an application shouldn't manage its own log file rotation, storage, or routing at all. It should simply write a continuous stream of events to stdout, and let the execution environment decide what to do with that stream. This is exactly why Docker's log-capturing mechanism (see the lifecycle and production topics) is built around capturing stdout/stderr specifically. An application that instead insists on writing to its own internal log files requires extra plumbing (a sidecar, a shared volume) just to get its logs captured. This fights against the principle rather than working with it.

Factor IX: Disposability — fast startup and graceful shutdown

An application should start up quickly and shut down gracefully, handling SIGTERM properly to finish in-flight work before exiting. This directly explains the emphasis, covered in the lifecycle topic, on using exec-form CMD/ENTRYPOINT (so signals reach the application process directly) and on applications handling SIGTERM correctly. Orchestrators (Docker's own --restart, Kubernetes's rolling updates and scaling) constantly start and stop container instances as a normal, routine part of operation, not an exceptional event.

Factor V: Build, release, run — strict separation of stages

The methodology insists on a strict separation between building an artifact, combining it with environment-specific configuration to create a "release," and actually running that release. This is precisely the "build once, deploy the same artifact everywhere, configure at runtime" pattern covered in the environment-configuration question. It is also precisely why baking environment-specific values into a build is considered an anti-pattern, rather than just a style preference.

Why this matters for an interview beyond just naming the factors

Being able to connect a specific twelve-factor principle to a specific Docker/container best practice demonstrates that you understand why these practices are recommended, not just that they are. For example: Factor XI maps to logging to stdout rather than files; Factor III maps to runtime config injection rather than baked-in values; Factor IX maps to exec-form CMD and proper SIGTERM handling. This is a meaningfully stronger signal than simply reciting "you should log to stdout" as an isolated rule, without the underlying reasoning that connects it to the broader goal of building applications that genuinely fit the containerized, orchestrated deployment model. It is also a genuinely useful checklist when containerizing an existing, previously non-containerized application. Writing to local log files, or expecting config baked into the install directory, are exactly the kinds of assumptions that need real adaptation work — not just a Dockerfile wrapped around them.

Related Resources

The Twelve-Factor App