What tools and commands do you use to investigate a failing pod?
Quick Answer
The core toolkit: `kubectl describe pod` (events, conditions, container states — the first stop for almost any problem), `kubectl logs` (and `--previous` for a crashed container's last output), `kubectl exec -it -- sh` (an interactive shell inside a running container), `kubectl get events --sort-by=.lastTimestamp` (cluster-wide recent events, useful when you're not sure which object is actually at fault), and `kubectl top pod`/`kubectl top node` (current CPU/memory usage, when metrics-server is installed).
Detailed Answer
kubectl describe — the essential first command
kubectl describe pod my-app-xyz
Shows the Pod's full spec, its current status/conditions (see the pod-lifecycle question), each container's current and last state (with exit codes/reasons), and — critically — the Events section at the bottom, which is a chronological log of everything that's happened to this specific object recently (scheduling decisions, probe failures, image pull attempts). This should almost always be the very first command run when investigating any Pod problem.
kubectl logs — application-level output
kubectl logs my-app-xyz # current container's stdout/stderr
kubectl logs my-app-xyz --previous # the LAST TERMINATED instance's logs (essential for crash loops)
kubectl logs my-app-xyz -c sidecar-container # a specific container, for multi-container Pods
kubectl logs my-app-xyz --since=10m # only recent logs, useful on a noisy long-running app
kubectl logs -f my-app-xyz # follow/stream logs live
kubectl exec — an interactive shell inside the container
kubectl exec -it my-app-xyz -- /bin/sh
Lets you poke around inside a currently running container directly — check environment variables, test network connectivity to a dependency, inspect mounted config files. Only useful if the container stays up long enough to attach to (not helpful for a container that crashes within milliseconds of starting).
kubectl debug — attaching to a Pod without modifying it
kubectl debug -it my-app-xyz --image=busybox --target=my-app-xyz
A more modern alternative for cases where exec isn't sufficient — e.g., the container image itself has no shell at all (common for minimal/distroless production images), or the Pod is crashing too fast to exec into. This attaches an ephemeral debug container to the existing Pod (sharing its network/process namespace, depending on flags), letting you investigate using a full-featured debug image without altering the original Pod's spec.
kubectl get events — cluster/namespace-wide recent activity
kubectl get events --sort-by=.lastTimestamp -n production
Useful when you're not yet sure which specific object is actually at fault — shows recent events across the whole namespace (or cluster, with -A), which can reveal a problem at a different layer than the one you started investigating (e.g., you're looking at a Pod, but the real root cause event was a failed PVC provisioning or a node becoming NotReady).
kubectl top — current resource usage
kubectl top pod my-app-xyz
kubectl top node
Requires metrics-server (or an equivalent) to be running in the cluster (see the HPA question) — shows current, real-time CPU/memory usage, useful for confirming whether a Pod is actually approaching its resource limits (a lead-in to investigating OOMKills or CPU throttling — see those questions) without needing a full metrics/monitoring stack for a quick, immediate check.
The general debugging workflow this toolkit supports
kubectl get pods— spot which Pod(s) are unhealthy and their current status/phase.kubectl describe pod— get the specific reason (events, conditions, container states).kubectl logs(with--previousif relevant) — get the application's own explanation, if it logged one.kubectl exec/kubectl debug— interactively investigate further if logs/describe aren't sufficient.kubectl get events/kubectl top— widen the investigation if the root cause seems to be somewhere other than the Pod itself.
Being fluent with this sequence — not just knowing the commands exist, but knowing the order and reason to reach for each — is what separates real hands-on troubleshooting experience from surface familiarity with kubectl's command list.