How do you debug a Pod stuck in CrashLoopBackOff?
Quick Answer
CrashLoopBackOff means the container keeps starting, then exiting/crashing, and Kubernetes is applying an exponential backoff delay between restart attempts. Start with `kubectl logs <pod> --previous` to see the crashed container's actual output (not the new attempt's, which may not have logged anything yet), then `kubectl describe pod <pod>` for exit codes and recent events, checking specifically for a misconfigured liveness probe killing an otherwise-healthy container, an application error on startup, or a missing dependency/configuration.
Detailed Answer
What CrashLoopBackOff actually means
The container is starting, then exiting (crashing, or being killed) repeatedly, and Kubernetes is deliberately backing off between restart attempts (waiting progressively longer — 10s, 20s, 40s, up to a cap around 5 minutes) rather than restarting instantly and indefinitely, which would otherwise hammer the node with a tight restart loop.
kubectl get pods
# NAME READY STATUS RESTARTS AGE
# my-app-xyz 0/1 CrashLoopBackOff 7 12m
Step 1: check the previous (crashed) container's logs
kubectl logs my-app-xyz --previous
This is the single most important first command — --previous retrieves logs from the last terminated instance of the container, which usually contains the actual error message explaining why it crashed (a stack trace, a missing environment variable error, a failed database connection). Without --previous, kubectl logs shows the current (possibly still-starting, possibly not-yet-logged-anything) attempt, which may be empty or unhelpful.
Step 2: check describe for exit code and events
kubectl describe pod my-app-xyz
Look specifically at:
Last State: Terminated,Reason,Exit Code— a specific exit code narrows down the cause considerably:0(clean exit — odd for a container that's supposed to run forever, might indicate the main process finished and exited normally when it shouldn't have),1(generic application error),137(128+SIGKILL— often an OOMKill, see that question, or a liveness probe failure),143(128+SIGTERM— graceful termination, possibly from a liveness probe or a manual action).- Events at the bottom — often shows directly whether a liveness probe is failing and killing the container (
Liveness probe failed: ...), which points straight at a probe misconfiguration rather than the application itself being broken.
Common root causes, roughly in order of frequency
- Application error on startup — a missing environment variable, a bad configuration file, an unhandled exception during initialization. The
--previouslogs should show this directly. - Misconfigured liveness probe — the probe is checking something that isn't actually indicative of a fatal problem (e.g., a downstream dependency being briefly unavailable), causing an otherwise-healthy container to be killed repeatedly (see the probes question).
- Missing dependency/resource — the application can't reach a required database, another service, or a mounted ConfigMap/Secret that doesn't exist or is misnamed.
- OOMKilled (see that question) — the container is repeatedly exceeding its memory limit;
describe podwill showOOMKilledas the termination reason distinctly from a generic crash. - Immediate exit due to incorrect container command/entrypoint — e.g., a container built to run a one-shot script rather than a long-running server process, used incorrectly in a Deployment (which expects the main process to keep running).
When logs alone aren't enough
kubectl exec -it my-app-xyz -- /bin/sh # only works if the container is currently running long enough
If the crash happens too fast to exec into the container, consider temporarily overriding the Pod's command to something that keeps it alive long enough to investigate interactively (command: ["sleep", "3600"], in a debug copy of the manifest — never in the real production manifest), or use kubectl debug (a newer, purpose-built command for attaching an ephemeral debug container to a running or crashing Pod) to get a shell alongside the problematic container without needing to modify its spec at all.
Naming --previous specifically (rather than just "check the logs") is a strong, concrete signal of hands-on debugging experience — it's the detail that trips up people who've only read about Kubernetes without actually having debugged a real crash loop.