How do you debug a Pod stuck in ImagePullBackOff or Pending?

Detailed Answer

ImagePullBackOff — the kubelet can't pull the image

kubectl describe pod my-app-xyz
# Events:
#   Warning  Failed     kubelet  Failed to pull image "myapp:1.0.": rpc error: ...
#   Warning  BackOff    kubelet  Back-off pulling image "myapp:1.0."

Common causes, in rough order of frequency:

Typo in the image name or tag — a trailing period (as in the example above — myapp:1.0. instead of myapp:1.0), a misspelled repository name, or a tag that was never actually pushed. kubectl describe pod shows the exact image string the kubelet tried to pull, and the exact error the registry returned — often enough to spot the typo immediately.
Private registry requiring authentication — if the image is in a private registry and no credentials are configured, the pull fails with an authentication/authorization error. Fix by creating a docker-registry Secret and referencing it via imagePullSecrets in the Pod spec (or the associated ServiceAccount, so every Pod using that ServiceAccount picks it up automatically).

spec:
  imagePullSecrets:
    - name: my-registry-credentials
  containers:
    - name: app
      image: private-registry.example.com/myapp:1.0

Network connectivity issue — the node genuinely can't reach the registry (a firewall rule, a DNS resolution problem, the registry being down) — worth checking directly from a node or a debug Pod if credentials and image name both check out.
Rate limiting — some public registries (notably Docker Hub) impose pull rate limits per IP/account; a burst of Pod creations across many nodes can occasionally hit this, especially without an authenticated account configured for higher limits.

Pending — the scheduler can't place the Pod anywhere

kubectl describe pod my-app-xyz
# Events:
#   Warning  FailedScheduling  default-scheduler  0/5 nodes are available:
#     3 Insufficient memory, 2 node(s) had taint {dedicated: gpu}, that the pod didn't tolerate.

The Events section of describe pod is the essential first stop — it states, in plain language, exactly why every node was rejected during scheduling's filtering phase (see the scheduling question). Common reasons:

Insufficient resources — no node has enough unreserved CPU/memory to satisfy the Pod's requests; either the cluster genuinely needs more capacity (Cluster Autoscaler should address this automatically if configured — see that question), or the Pod's requests are set unrealistically high.
Unsatisfied taints/tolerations or node affinity — the Pod requires something (a specific node label, tolerance for a taint) that no current node provides.
Volume/topology issues — a PersistentVolumeClaim can't be bound or provisioned (e.g., a StorageClass misconfiguration, or a zone-topology mismatch between where the volume was created and where the Pod could be scheduled — see the StorageClass question).
PodDisruptionBudget or admission webhook rejection — less common for a purely Pending state, but worth checking if the Events mention an admission controller rejecting the request outright.

The universal first diagnostic command

kubectl describe pod <pod-name>

For both ImagePullBackOff and Pending, this single command's Events section is almost always where the actual, specific, human-readable reason lives — the general debugging instinct should always be "read the events before guessing," rather than jumping straight to speculation about what might be wrong.

How do you debug a Pod stuck in ImagePullBackOff or Pending?

Quick Answer

Detailed Answer

ImagePullBackOff — the kubelet can't pull the image

Pending — the scheduler can't place the Pod anywhere

The universal first diagnostic command

Related Resources

Related Questions