How do you approach debugging a production issue in a Java application you didn't write?

Detailed Answer

A structured approach, regardless of who wrote the original code:

Gather evidence before touching code. Logs, error messages, stack traces, metrics/dashboards, and recent deploy/config-change history usually narrow the problem space enormously before you write a single line — a huge fraction of production incidents trace back to a recent change.
Reproduce, or narrow the conditions, if at all possible. A reliably reproducible bug is far easier to fix than one you can only observe in production; even a partial repro (same input shape, same load pattern) helps.
Match the tool to the symptom:
- A hang or high CPU: take a thread dump (jstack <pid>, or kill -3 for a plain stack trace to stdout) — look for threads stuck in the same call, or an explicit deadlock report.
- Memory growth / OutOfMemoryError: a heap dump (jmap -dump) analyzed in a tool like Eclipse MAT, looking at what's retaining the most memory and why it's still reachable.
- A specific logic bug: attach a remote debugger if the environment allows it, or add targeted logging around the suspected code path and redeploy to a staging/canary environment.
Form a hypothesis, then verify it minimally — resist the urge to change several things at once; change one variable, observe, and confirm before moving to a fix.
Fix the root cause, not just the symptom, and add a regression test (or at least a monitoring alert) so the same failure mode is caught automatically next time.
Communicate clearly — especially for an unfamiliar codebase, documenting what you learned about the system along the way (in comments, a wiki, or a postmortem) pays off for the next person who touches it.

How do you approach debugging a production issue in a Java application you didn't write?

Quick Answer

Detailed Answer