How does centralized logging typically work in Kubernetes, given pods are ephemeral?

Detailed Answer

Why Pod ephemerality makes centralized logging necessary

kubectl logs reads directly from a log file the container runtime maintains on the node the Pod is (or was recently) running on — the moment a Pod is deleted (rescheduled, scaled down, or replaced during a rollout), those local log files are eventually cleaned up too, and kubectl logs for that specific Pod name no longer works at all. For anything beyond quick, live debugging of a still-running Pod, relying on kubectl logs alone is insufficient — you need logs collected and stored somewhere durable, independent of any individual Pod's lifetime.

The standard architecture: a log-shipping DaemonSet

Node                                    Node
┌──────────────────────┐          ┌──────────────────────┐
│ Pod A -> stdout/stderr │          │ Pod C -> stdout/stderr │
│    -> log file on node  │          │    -> log file on node  │
│ Pod B -> stdout/stderr │          │                        │
│    -> log file on node  │          │  Fluent Bit (DaemonSet) │
│                        │          │    tails log files       │
│ Fluent Bit (DaemonSet) │          │    ships them out          │
│    tails log files       │          └──────────────────────┘
│    ships them out          │                     │
└──────────────────────┘                     │
             │                                       │
             └───────────────┬───────────────────────┘
                             ▼
                  Centralized logging backend
              (Elasticsearch, Loki, CloudWatch, etc.)

A log-shipping agent (Fluentd, Fluent Bit, Vector, or a cloud provider's own agent) runs as a DaemonSet (see the workload controllers topic — this is exactly the canonical DaemonSet use case) so exactly one copy runs on every node, continuously tailing every container's log files on that node and forwarding them to a centralized backend, often enriching each log line with useful Kubernetes metadata (Pod name, namespace, labels) along the way.

Common centralized logging backends

Elasticsearch (or OpenSearch) + Kibana — the classic "ELK/EFK stack" (Elasticsearch, Fluentd, Kibana), offering rich full-text search and dashboarding.
Grafana Loki — a more lightweight, cost-efficient alternative that indexes only metadata/labels (not full log text), often paired with Grafana for visualization, appealing when Elasticsearch's resource cost and operational complexity are undesirable.
Cloud-native logging services — AWS CloudWatch Logs, GCP Cloud Logging, Azure Monitor Logs — convenient when already running on that cloud provider, with less operational overhead than self-hosting a logging stack.

An alternative pattern: sidecar-based log shipping

Instead of a node-level DaemonSet, some setups use a sidecar container per Pod (see the sidecar pattern question) specifically to ship that one Pod's logs — useful when an application writes logs to a file rather than stdout/stderr (requiring a sidecar with a shared volume to tail that specific file), but generally higher overhead (one shipping agent per Pod rather than one per node) than the DaemonSet approach, and used more selectively for that reason.

Best practice: log to stdout/stderr, not files

Applications running in containers should generally write logs to stdout/stderr rather than to files within the container — this is what the container runtime and standard log-shipping DaemonSets are built to capture automatically, without needing any sidecar or special per-application configuration. Writing to internal files instead requires extra plumbing (a sidecar, or a shared volume) to get those logs collected at all — one of the "twelve-factor app" principles that maps directly onto how Kubernetes logging infrastructure expects applications to behave.