What tools and commands do you use to investigate a failing pod?

Detailed Answer

kubectl describe — the essential first command

kubectl describe pod my-app-xyz

Shows the Pod's full spec, its current status/conditions (see the pod-lifecycle question), each container's current and last state (with exit codes/reasons), and — critically — the Events section at the bottom, which is a chronological log of everything that's happened to this specific object recently (scheduling decisions, probe failures, image pull attempts). This should almost always be the very first command run when investigating any Pod problem.

kubectl logs — application-level output

kubectl logs my-app-xyz                      # current container's stdout/stderr
kubectl logs my-app-xyz --previous            # the LAST TERMINATED instance's logs (essential for crash loops)
kubectl logs my-app-xyz -c sidecar-container   # a specific container, for multi-container Pods
kubectl logs my-app-xyz --since=10m            # only recent logs, useful on a noisy long-running app
kubectl logs -f my-app-xyz                     # follow/stream logs live

kubectl exec — an interactive shell inside the container

kubectl exec -it my-app-xyz -- /bin/sh

Lets you poke around inside a currently running container directly — check environment variables, test network connectivity to a dependency, inspect mounted config files. Only useful if the container stays up long enough to attach to (not helpful for a container that crashes within milliseconds of starting).

kubectl debug — attaching to a Pod without modifying it

kubectl debug -it my-app-xyz --image=busybox --target=my-app-xyz

A more modern alternative for cases where exec isn't sufficient — e.g., the container image itself has no shell at all (common for minimal/distroless production images), or the Pod is crashing too fast to exec into. This attaches an ephemeral debug container to the existing Pod (sharing its network/process namespace, depending on flags), letting you investigate using a full-featured debug image without altering the original Pod's spec.

kubectl get events — cluster/namespace-wide recent activity

kubectl get events --sort-by=.lastTimestamp -n production

Useful when you're not yet sure which specific object is actually at fault — shows recent events across the whole namespace (or cluster, with -A), which can reveal a problem at a different layer than the one you started investigating (e.g., you're looking at a Pod, but the real root cause event was a failed PVC provisioning or a node becoming NotReady).

kubectl top — current resource usage

kubectl top pod my-app-xyz
kubectl top node

Requires metrics-server (or an equivalent) to be running in the cluster (see the HPA question) — shows current, real-time CPU/memory usage, useful for confirming whether a Pod is actually approaching its resource limits (a lead-in to investigating OOMKills or CPU throttling — see those questions) without needing a full metrics/monitoring stack for a quick, immediate check.

The general debugging workflow this toolkit supports

kubectl get pods — spot which Pod(s) are unhealthy and their current status/phase.
kubectl describe pod — get the specific reason (events, conditions, container states).
kubectl logs (with --previous if relevant) — get the application's own explanation, if it logged one.
kubectl exec/kubectl debug — interactively investigate further if logs/describe aren't sufficient.
kubectl get events/kubectl top — widen the investigation if the root cause seems to be somewhere other than the Pod itself.

Being fluent with this sequence — not just knowing the commands exist, but knowing the order and reason to reach for each — is what separates real hands-on troubleshooting experience from surface familiarity with kubectl's command list.