What is etcd, and why is it critical to a Kubernetes cluster?

Detailed Answer

What etcd actually stores

Every Kubernetes object you create — every Deployment, Service, ConfigMap, Secret, Pod status — is ultimately stored as a key in etcd. The API server is the only component that talks to etcd directly; everything else (kubectl, the scheduler, controllers, kubelets) goes through the API server, which reads and writes to etcd on their behalf.

kubectl apply -f deployment.yaml
   → API server validates & authorizes
   → API server writes the Deployment object to etcd
   → Controller manager's Deployment controller, watching the API server,
     notices the new/changed object and creates matching ReplicaSets/Pods

Why Raft consensus matters

etcd is typically run as a cluster of an odd number of nodes (commonly 3 or 5) using the Raft consensus algorithm to agree on writes — a write is only considered committed once a majority (quorum) of etcd nodes have durably persisted it. This gives etcd strong consistency (every read reflects the most recently committed write) and tolerance of node failures (a 5-node etcd cluster can lose 2 nodes and keep operating, since 3 still form a majority) — but it also means etcd write latency is bounded by the slowest node needed to reach quorum, and etcd performance is quite sensitive to disk I/O latency and network latency between its nodes.

Why losing etcd is catastrophic

Every other control plane component is effectively stateless or easily reconstructible — the API server holds no state of its own, the scheduler and controllers can be restarted and will simply re-read current state from etcd (via the API server) and resume operating. But if etcd's data is lost or corrupted without a backup, there is no other copy of the cluster's state anywhere — every Deployment, Service, Secret, and their current status is simply gone, and the cluster must effectively be rebuilt from whatever configuration (YAML manifests, Helm charts, GitOps repositories) exists outside the cluster.

Backup and disaster recovery

# Take a point-in-time snapshot of etcd's data
etcdctl snapshot save backup.db

# Restore from a snapshot (typically as part of rebuilding a control plane node)
etcdctl snapshot restore backup.db

Regular, automated etcd snapshots — stored somewhere other than the etcd nodes themselves — combined with periodic restore testing (an untested backup isn't a real backup) is standard practice for any self-managed production cluster. Managed Kubernetes services (EKS, GKE, AKS) handle etcd backup and the entire control plane's resilience for you, which is one of the most significant operational burdens they take off a team's plate compared to self-hosting.

Security note

Because etcd holds every Secret's data (by default, unencrypted unless encryption-at-rest is explicitly configured — see the security topic), direct network access to etcd must be tightly restricted to the control plane components that need it, and encryption at rest should be enabled for any cluster storing genuinely sensitive Secret data.