How do you debug a Pod stuck in ImagePullBackOff or Pending?
Quick Answer
`ImagePullBackOff` means the kubelet can't pull the specified container image — check for a typo in the image name/tag, a private registry requiring credentials that aren't configured (`imagePullSecrets`), or a network issue reaching the registry. A Pod stuck `Pending` (with no container-level errors) usually means the scheduler can't find a node satisfying its requirements — `kubectl describe pod` shows the specific reason directly in its Events section, most commonly insufficient resources, an unsatisfied affinity/taint rule, or a volume that can't be provisioned/attached.
Detailed Answer
ImagePullBackOff — the kubelet can't pull the image
kubectl describe pod my-app-xyz
# Events:
# Warning Failed kubelet Failed to pull image "myapp:1.0.": rpc error: ...
# Warning BackOff kubelet Back-off pulling image "myapp:1.0."
Common causes, in rough order of frequency:
- Typo in the image name or tag — a trailing period (as in the example above —
myapp:1.0.instead ofmyapp:1.0), a misspelled repository name, or a tag that was never actually pushed.kubectl describe podshows the exact image string the kubelet tried to pull, and the exact error the registry returned — often enough to spot the typo immediately. - Private registry requiring authentication — if the image is in a private registry and no credentials are configured, the pull fails with an authentication/authorization error. Fix by creating a
docker-registrySecret and referencing it viaimagePullSecretsin the Pod spec (or the associated ServiceAccount, so every Pod using that ServiceAccount picks it up automatically).
spec:
imagePullSecrets:
- name: my-registry-credentials
containers:
- name: app
image: private-registry.example.com/myapp:1.0
- Network connectivity issue — the node genuinely can't reach the registry (a firewall rule, a DNS resolution problem, the registry being down) — worth checking directly from a node or a debug Pod if credentials and image name both check out.
- Rate limiting — some public registries (notably Docker Hub) impose pull rate limits per IP/account; a burst of Pod creations across many nodes can occasionally hit this, especially without an authenticated account configured for higher limits.
Pending — the scheduler can't place the Pod anywhere
kubectl describe pod my-app-xyz
# Events:
# Warning FailedScheduling default-scheduler 0/5 nodes are available:
# 3 Insufficient memory, 2 node(s) had taint {dedicated: gpu}, that the pod didn't tolerate.
The Events section of describe pod is the essential first stop — it states, in plain language, exactly why every node was rejected during scheduling's filtering phase (see the scheduling question). Common reasons:
- Insufficient resources — no node has enough unreserved CPU/memory to satisfy the Pod's requests; either the cluster genuinely needs more capacity (Cluster Autoscaler should address this automatically if configured — see that question), or the Pod's requests are set unrealistically high.
- Unsatisfied taints/tolerations or node affinity — the Pod requires something (a specific node label, tolerance for a taint) that no current node provides.
- Volume/topology issues — a PersistentVolumeClaim can't be bound or provisioned (e.g., a StorageClass misconfiguration, or a zone-topology mismatch between where the volume was created and where the Pod could be scheduled — see the StorageClass question).
- PodDisruptionBudget or admission webhook rejection — less common for a purely
Pendingstate, but worth checking if the Events mention an admission controller rejecting the request outright.
The universal first diagnostic command
kubectl describe pod <pod-name>
For both ImagePullBackOff and Pending, this single command's Events section is almost always where the actual, specific, human-readable reason lives — the general debugging instinct should always be "read the events before guessing," rather than jumping straight to speculation about what might be wrong.