How do you profile a Python program's performance?
Quick Answer
Use `cProfile` (built-in, deterministic profiler) to find which functions consume the most cumulative/total time across a whole program run; use `timeit` for precise micro-benchmarks of a small code snippet; use a memory profiler (`memory_profiler`, `tracemalloc`) to find where memory is actually being allocated. Always profile before optimizing — intuition about where time goes is frequently wrong.
Detailed Answer
cProfile: whole-program, function-level profiling
import cProfile
import pstats
def slow_function():
return sum(i * i for i in range(10**6))
cProfile.run("slow_function()", "output.prof")
stats = pstats.Stats("output.prof")
stats.sort_stats("cumulative").print_stats(10) # top 10 by cumulative time
python -m cProfile -s cumulative my_script.py # profile a whole script from the CLI
cProfile instruments every function call, reporting ncalls (call
count), tottime (time in the function itself, excluding sub-calls), and
cumtime (time including everything it called) — the standard first
step for "where is my program actually spending time," which is often
surprising compared to where you assumed the bottleneck was.
timeit: precise micro-benchmarks
import timeit
timeit.timeit("[x**2 for x in range(1000)]", number=10000)
timeit.timeit("list(map(lambda x: x**2, range(1000)))", number=10000)
python -m timeit -s "data = list(range(1000))" "sorted(data)"
timeit runs a snippet many times in a controlled environment (disabling
the garbage collector during timing by default, to avoid GC pauses
skewing results), giving a reliable comparison between two small
alternative implementations — the right tool for "is approach A or B
faster," as opposed to cProfile's "where does the whole program's time
go."
Memory profiling: tracemalloc (built-in)
import tracemalloc
tracemalloc.start()
run_program()
snapshot = tracemalloc.take_snapshot()
for stat in snapshot.statistics("lineno")[:10]:
print(stat) # top 10 lines by memory allocated, with file/line info
tracemalloc (built into the standard library since 3.4) tracks memory
allocations by source location, letting you find exactly which lines are
responsible for the most memory use — invaluable for tracking down
unexpected memory growth or leaks in long-running processes.
Line-level profiling: line_profiler (third-party)
# pip install line_profiler
@profile # applied via `kernprof -l script.py`, not a normal decorator
def slow_function():
...
When cProfile identifies a hot function but you need to know which
specific line inside it is slow, line_profiler gives per-line timing
— more granular than cProfile's per-function view, at the cost of
needing a separate tool and higher overhead while running.
The discipline: measure before optimizing
The universal rule across all of this: profile first, then optimize the actual bottleneck — intuitions about "the slow part" are frequently wrong (a startup-time import, an accidentally-quadratic loop, or excessive small allocations often dominate over the code a developer assumed was slow), and optimizing the wrong part wastes effort while adding complexity for no measured benefit.
Interview-ready summary: cProfile finds which functions dominate a
whole program's runtime; timeit precisely compares small snippets;
tracemalloc/memory_profiler locate where memory is actually
allocated. The overarching principle is to profile first — intuition
about bottlenecks is unreliable, and optimization effort should follow
measured data, not guesses.