What are liveness, readiness, and startup probes, and how do they differ?

Detailed Answer

Why three different probes exist

"Is this container healthy" turns out to have several distinct meanings, and conflating them causes real production problems — a container that's alive but not yet ready to serve traffic (still loading a large cache) shouldn't be killed, and a slow-starting container shouldn't be judged against the same timing as a fully-warmed-up one.

Liveness probe — is this container still working?

livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 15
  periodSeconds: 10
  failureThreshold: 3

If this probe fails failureThreshold times in a row, the kubelet kills the container and restarts it (subject to the Pod's restartPolicy). This is meant to catch situations where a process is technically still running but has gotten into a genuinely broken state it can't recover from on its own (deadlocked, stuck in an infinite loop) — a restart is the appropriate remedy. A liveness probe should only fail for problems a restart would actually fix — a liveness probe that checks a downstream dependency (like a database connection) is a common and dangerous misconfiguration, since it causes the container to be endlessly restarted for a problem restarting it can't solve at all (the database being down), rather than just marking it not-ready.

Readiness probe — is this container currently able to serve traffic?

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  periodSeconds: 5

If this probe fails, the Pod is removed from the Service's endpoints (see the networking topic) — traffic stops being routed to it — but the container is left running, not restarted. This is the correct mechanism for temporary, self-resolving unavailability: warming up a cache on startup, briefly reconnecting to a dependency, or gracefully draining in-flight requests before shutdown. This is also exactly the mechanism that makes rolling updates safe (see the workload controllers topic) — a new Pod only starts receiving traffic once its readiness probe passes.

Startup probe — protects slow-starting containers

startupProbe:
  httpGet:
    path: /healthz
    port: 8080
  failureThreshold: 30
  periodSeconds: 10       # allows up to 300 seconds (30 x 10) for startup
livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  periodSeconds: 10

While a startup probe is configured and hasn't yet succeeded, liveness and readiness probes are disabled entirely — this exists specifically for applications with a slow, variable startup time (a large in-memory cache warm-up, a JVM application with a long class-loading phase), where a liveness probe's normal, tighter timing (tuned for steady-state health checking) would otherwise kill the container for simply still being in its legitimate startup phase, before it ever got a chance to finish starting.

Side-by-side summary

	Checks	On failure	Typical use
Liveness	Is the process still functioning	Kill and restart the container	Detecting deadlocks/unrecoverable internal states
Readiness	Can it currently serve traffic	Remove from Service endpoints (no restart)	Temporary unavailability, warm-up, graceful shutdown
Startup	Has the (slow) startup completed	Delays liveness/readiness checks until success	Slow-starting applications, avoiding premature liveness kills

Why getting this wrong causes real incidents

The single most common probe misconfiguration is a liveness probe that's too strict, or checks the wrong thing (a downstream dependency instead of the process's own health) — this produces exactly the symptom covered in the CrashLoopBackOff troubleshooting question: a container repeatedly killed and restarted for a condition restarting can never actually fix, often making an already-degraded situation (a slow dependency) actively worse by adding restart churn on top of it.