How do you profile a Python program's performance?

7 minintermediateperformanceprofilingcprofiletimeit

Quick Answer

Use `cProfile` (built-in, deterministic profiler) to find which functions consume the most cumulative/total time across a whole program run; use `timeit` for precise micro-benchmarks of a small code snippet; use a memory profiler (`memory_profiler`, `tracemalloc`) to find where memory is actually being allocated. Always profile before optimizing — intuition about where time goes is frequently wrong.

Detailed Answer

cProfile: whole-program, function-level profiling

import cProfile
import pstats

def slow_function():
    return sum(i * i for i in range(10**6))

cProfile.run("slow_function()", "output.prof")

stats = pstats.Stats("output.prof")
stats.sort_stats("cumulative").print_stats(10)   # top 10 by cumulative time
python -m cProfile -s cumulative my_script.py   # profile a whole script from the CLI

cProfile instruments every function call, reporting ncalls (call count), tottime (time in the function itself, excluding sub-calls), and cumtime (time including everything it called) — the standard first step for "where is my program actually spending time," which is often surprising compared to where you assumed the bottleneck was.

timeit: precise micro-benchmarks

import timeit

timeit.timeit("[x**2 for x in range(1000)]", number=10000)
timeit.timeit("list(map(lambda x: x**2, range(1000)))", number=10000)
python -m timeit -s "data = list(range(1000))" "sorted(data)"

timeit runs a snippet many times in a controlled environment (disabling the garbage collector during timing by default, to avoid GC pauses skewing results), giving a reliable comparison between two small alternative implementations — the right tool for "is approach A or B faster," as opposed to cProfile's "where does the whole program's time go."

Memory profiling: tracemalloc (built-in)

import tracemalloc

tracemalloc.start()
run_program()
snapshot = tracemalloc.take_snapshot()

for stat in snapshot.statistics("lineno")[:10]:
    print(stat)   # top 10 lines by memory allocated, with file/line info

tracemalloc (built into the standard library since 3.4) tracks memory allocations by source location, letting you find exactly which lines are responsible for the most memory use — invaluable for tracking down unexpected memory growth or leaks in long-running processes.

Line-level profiling: line_profiler (third-party)

# pip install line_profiler
@profile   # applied via `kernprof -l script.py`, not a normal decorator
def slow_function():
    ...

When cProfile identifies a hot function but you need to know which specific line inside it is slow, line_profiler gives per-line timing — more granular than cProfile's per-function view, at the cost of needing a separate tool and higher overhead while running.

The discipline: measure before optimizing

The universal rule across all of this: profile first, then optimize the actual bottleneck — intuitions about "the slow part" are frequently wrong (a startup-time import, an accidentally-quadratic loop, or excessive small allocations often dominate over the code a developer assumed was slow), and optimizing the wrong part wastes effort while adding complexity for no measured benefit.

Interview-ready summary: cProfile finds which functions dominate a whole program's runtime; timeit precisely compares small snippets; tracemalloc/memory_profiler locate where memory is actually allocated. The overarching principle is to profile first — intuition about bottlenecks is unreliable, and optimization effort should follow measured data, not guesses.