What is a StatefulSet, and when do you need one instead of a Deployment?
Quick Answer
A StatefulSet manages Pods that need a stable, unique identity and stable storage across restarts — each Pod gets a predictable, persistent name (`web-0`, `web-1`, ...) and, if configured, its own dedicated PersistentVolumeClaim that follows it even if the Pod is rescheduled. Use a StatefulSet for stateful applications where individual instances have distinct identity or data — databases, distributed message queues, anything where "which specific instance am I" and "my data must survive my restart" both matter — and a Deployment for stateless, interchangeable replicas.
Detailed Answer
Why Deployments are wrong for stateful applications
A Deployment's Pods are interchangeable — they get randomly-suffixed names (web-7d8f9c-x2k4p), no guaranteed stable identity, and if you use a PersistentVolumeClaim in a Deployment's Pod template, every replica shares the same PVC (or, more commonly, each gets a fresh empty volume depending on configuration) — there's no built-in way to give each replica its own dedicated, durable, individually-tracked storage that follows that specific replica across restarts.
What a StatefulSet provides instead
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
serviceName: "web" # must reference a headless Service (see the networking topic)
replicas: 3
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: web
image: myapp:1.0
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gi
- Stable, predictable Pod names:
web-0,web-1,web-2— not random suffixes. Ifweb-1is deleted, its replacement is created with the exact same nameweb-1, not a new random one. - Stable network identity: combined with a headless Service, each Pod gets a predictable, individually-addressable DNS name (
web-0.web.default.svc.cluster.local) — essential for applications where peers need to address a specific other instance by name (e.g., a database replica connecting to a specific primary). - Per-replica persistent storage (
volumeClaimTemplates): each replica gets its own PVC (data-web-0,data-web-1,data-web-2), and critically, ifweb-1's Pod is deleted and recreated (even on a different node), it's reattached to the samedata-web-1PVC — its data survives, tied to its identity, not to whichever node happened to run it. - Ordered, sequential deployment and scaling: by default, StatefulSet Pods are created, updated, and terminated one at a time, in order (
web-0beforeweb-1beforeweb-2), which matters for applications with ordering dependencies (e.g., a database's designated primary must come up before replicas that need to connect to it).
When you actually need this
- Databases and distributed data stores run directly on Kubernetes (PostgreSQL, MongoDB, Cassandra, Elasticsearch) — each replica typically holds a distinct portion of data and needs stable identity to know its role and reconnect to its own data after a restart.
- Distributed coordination systems (ZooKeeper, etcd itself, when run on Kubernetes) where each member needs a stable identity to participate correctly in a consensus protocol.
- Any application where "which replica am I" is meaningful to the application's own logic, not just an interchangeable unit of horizontal scale.
When you don't
Stateless web servers, API services, or workers that don't care which specific instance handles a given request, and don't need to persist state tied to a specific replica's identity — these are the common case, and a Deployment (simpler, with more flexible rollout behavior) is the right default. Reaching for a StatefulSet when a Deployment would do adds real operational complexity (slower, ordered rollouts; PVC lifecycle management) for no corresponding benefit.
An important caveat
Running genuinely stateful, data-critical systems like production databases directly on Kubernetes (rather than using a managed cloud database service) is itself a significant operational commitment — StatefulSets solve the scheduling and identity problem, but backup, failover, and data consistency logic for the actual stateful application usually still needs to be handled by an Operator (see the extensibility topic) or the application's own clustering logic, not by the StatefulSet primitive alone.