How do you debug a Pod stuck in CrashLoopBackOff?

7 minintermediatecrashloopbackofftroubleshootingdebugging

Quick Answer

CrashLoopBackOff means the container keeps starting, then exiting/crashing, and Kubernetes is applying an exponential backoff delay between restart attempts. Start with `kubectl logs <pod> --previous` to see the crashed container's actual output (not the new attempt's, which may not have logged anything yet), then `kubectl describe pod <pod>` for exit codes and recent events, checking specifically for a misconfigured liveness probe killing an otherwise-healthy container, an application error on startup, or a missing dependency/configuration.

Detailed Answer

What CrashLoopBackOff actually means

The container is starting, then exiting (crashing, or being killed) repeatedly, and Kubernetes is deliberately backing off between restart attempts (waiting progressively longer — 10s, 20s, 40s, up to a cap around 5 minutes) rather than restarting instantly and indefinitely, which would otherwise hammer the node with a tight restart loop.

kubectl get pods
# NAME        READY   STATUS             RESTARTS   AGE
# my-app-xyz  0/1     CrashLoopBackOff   7          12m

Step 1: check the previous (crashed) container's logs

kubectl logs my-app-xyz --previous

This is the single most important first command — --previous retrieves logs from the last terminated instance of the container, which usually contains the actual error message explaining why it crashed (a stack trace, a missing environment variable error, a failed database connection). Without --previous, kubectl logs shows the current (possibly still-starting, possibly not-yet-logged-anything) attempt, which may be empty or unhelpful.

Step 2: check describe for exit code and events

kubectl describe pod my-app-xyz

Look specifically at:

  • Last State: Terminated, Reason, Exit Code — a specific exit code narrows down the cause considerably: 0 (clean exit — odd for a container that's supposed to run forever, might indicate the main process finished and exited normally when it shouldn't have), 1 (generic application error), 137 (128+SIGKILL — often an OOMKill, see that question, or a liveness probe failure), 143 (128+SIGTERM — graceful termination, possibly from a liveness probe or a manual action).
  • Events at the bottom — often shows directly whether a liveness probe is failing and killing the container (Liveness probe failed: ...), which points straight at a probe misconfiguration rather than the application itself being broken.

Common root causes, roughly in order of frequency

  1. Application error on startup — a missing environment variable, a bad configuration file, an unhandled exception during initialization. The --previous logs should show this directly.
  2. Misconfigured liveness probe — the probe is checking something that isn't actually indicative of a fatal problem (e.g., a downstream dependency being briefly unavailable), causing an otherwise-healthy container to be killed repeatedly (see the probes question).
  3. Missing dependency/resource — the application can't reach a required database, another service, or a mounted ConfigMap/Secret that doesn't exist or is misnamed.
  4. OOMKilled (see that question) — the container is repeatedly exceeding its memory limit; describe pod will show OOMKilled as the termination reason distinctly from a generic crash.
  5. Immediate exit due to incorrect container command/entrypoint — e.g., a container built to run a one-shot script rather than a long-running server process, used incorrectly in a Deployment (which expects the main process to keep running).

When logs alone aren't enough

kubectl exec -it my-app-xyz -- /bin/sh    # only works if the container is currently running long enough

If the crash happens too fast to exec into the container, consider temporarily overriding the Pod's command to something that keeps it alive long enough to investigate interactively (command: ["sleep", "3600"], in a debug copy of the manifest — never in the real production manifest), or use kubectl debug (a newer, purpose-built command for attaching an ephemeral debug container to a running or crashing Pod) to get a shell alongside the problematic container without needing to modify its spec at all.

Naming --previous specifically (rather than just "check the logs") is a strong, concrete signal of hands-on debugging experience — it's the detail that trips up people who've only read about Kubernetes without actually having debugged a real crash loop.