How does a rolling update work, and how do you roll back a bad deployment?

Detailed Answer

The default strategy: RollingUpdate

spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1   # at most 1 fewer than desired can be unavailable during the rollout
      maxSurge: 1          # at most 1 MORE than desired can exist during the rollout

With replicas: 4, maxUnavailable: 1, maxSurge: 1: the Deployment controller can have as few as 3 (4 - 1) and as many as 5 (4 + 1) total Pods (old + new combined) at any moment during the rollout. It creates a new Pod (from the new ReplicaSet), waits for it to pass its readiness probe (see the probes question) before considering it available, then terminates one old Pod — repeating until all old Pods are replaced.

Start:    [old][old][old][old]                    (4 old, 0 new)
Step 1:   [old][old][old][old][new]                (surge: 5 total, new not ready yet)
Step 2:   [old][old][old][new✓]                    (new became ready, one old terminated: 4 total)
Step 3:   [old][old][old][new✓][new]                (surge again: 5 total)
Step 4:   [old][old][new✓][new✓]                    (another old terminated: 4 total)
...continues until all 4 are the new version

Why readiness probes are essential to a safe rollout

The rollout only proceeds to terminate an old Pod once a new Pod is marked Ready by its readiness probe (see the probes question) — if the new version has a bug that causes it to fail its readiness check (e.g., it crashes on startup, or can't connect to a dependency), the rollout stalls rather than continuing to replace healthy old Pods with broken new ones. This is a critical safety property: without correctly configured readiness probes, a rolling update has no way to detect a bad new version and will happily replace every healthy Pod with broken ones.

Rolling back

kubectl rollout status deployment/web        # watch a rollout's progress
kubectl rollout history deployment/web        # see past revisions
kubectl rollout undo deployment/web           # roll back to the previous revision
kubectl rollout undo deployment/web --to-revision=3   # roll back to a specific revision

rollout undo works by re-pointing the Deployment at a previous ReplicaSet's Pod template (retained, scaled to zero, from an earlier rollout — up to spec.revisionHistoryLimit, default 10) and performing the same gradual rolling process, just in the opposite direction — scaling the old (soon-to-be-current-again) ReplicaSet up while scaling the currently-bad one down. This means rollback gets the exact same safety properties (readiness-gated, gradual) as a forward rollout.

Pausing a rollout mid-flight

kubectl rollout pause deployment/web    # freeze the rollout at its current state
kubectl rollout resume deployment/web   # continue it

Useful for making several related changes to a Deployment's spec (e.g., updating both the image and a resource limit) without triggering a rollout after each individual edit — pause, make all your changes, then resume to trigger exactly one coordinated rollout.

What rolling updates don't protect against

A rolling update only protects against a new version that fails its readiness probe — a bug that passes readiness checks but causes incorrect behavior under real production traffic (a subtle logic error, a slow memory leak) won't be caught by the rollout mechanism itself. This is exactly the gap that canary deployments and more sophisticated progressive-delivery tooling (see the production operations topic) are designed to close.