What are liveness, readiness, and startup probes, and how do they differ?
Quick Answer
A **liveness** probe checks whether a container is still working correctly; if it fails, the kubelet kills and restarts the container. A **readiness** probe checks whether a container is currently able to serve traffic; if it fails, the Pod is removed from Service endpoints (traffic stops being routed to it) but the container is *not* restarted. A **startup** probe protects slow-starting containers by disabling liveness/readiness checks until it succeeds, preventing a liveness probe from prematurely killing a container that's still legitimately warming up.
Detailed Answer
Why three different probes exist
"Is this container healthy" turns out to have several distinct meanings, and conflating them causes real production problems — a container that's alive but not yet ready to serve traffic (still loading a large cache) shouldn't be killed, and a slow-starting container shouldn't be judged against the same timing as a fully-warmed-up one.
Liveness probe — is this container still working?
livenessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 15
periodSeconds: 10
failureThreshold: 3
If this probe fails failureThreshold times in a row, the kubelet kills the container and restarts it (subject to the Pod's restartPolicy). This is meant to catch situations where a process is technically still running but has gotten into a genuinely broken state it can't recover from on its own (deadlocked, stuck in an infinite loop) — a restart is the appropriate remedy. A liveness probe should only fail for problems a restart would actually fix — a liveness probe that checks a downstream dependency (like a database connection) is a common and dangerous misconfiguration, since it causes the container to be endlessly restarted for a problem restarting it can't solve at all (the database being down), rather than just marking it not-ready.
Readiness probe — is this container currently able to serve traffic?
readinessProbe:
httpGet:
path: /ready
port: 8080
periodSeconds: 5
If this probe fails, the Pod is removed from the Service's endpoints (see the networking topic) — traffic stops being routed to it — but the container is left running, not restarted. This is the correct mechanism for temporary, self-resolving unavailability: warming up a cache on startup, briefly reconnecting to a dependency, or gracefully draining in-flight requests before shutdown. This is also exactly the mechanism that makes rolling updates safe (see the workload controllers topic) — a new Pod only starts receiving traffic once its readiness probe passes.
Startup probe — protects slow-starting containers
startupProbe:
httpGet:
path: /healthz
port: 8080
failureThreshold: 30
periodSeconds: 10 # allows up to 300 seconds (30 x 10) for startup
livenessProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 10
While a startup probe is configured and hasn't yet succeeded, liveness and readiness probes are disabled entirely — this exists specifically for applications with a slow, variable startup time (a large in-memory cache warm-up, a JVM application with a long class-loading phase), where a liveness probe's normal, tighter timing (tuned for steady-state health checking) would otherwise kill the container for simply still being in its legitimate startup phase, before it ever got a chance to finish starting.
Side-by-side summary
| Checks | On failure | Typical use | |
|---|---|---|---|
| Liveness | Is the process still functioning | Kill and restart the container | Detecting deadlocks/unrecoverable internal states |
| Readiness | Can it currently serve traffic | Remove from Service endpoints (no restart) | Temporary unavailability, warm-up, graceful shutdown |
| Startup | Has the (slow) startup completed | Delays liveness/readiness checks until success | Slow-starting applications, avoiding premature liveness kills |
Why getting this wrong causes real incidents
The single most common probe misconfiguration is a liveness probe that's too strict, or checks the wrong thing (a downstream dependency instead of the process's own health) — this produces exactly the symptom covered in the CrashLoopBackOff troubleshooting question: a container repeatedly killed and restarted for a condition restarting can never actually fix, often making an already-degraded situation (a slow dependency) actively worse by adding restart churn on top of it.