Reference counting: the primary mechanism
import sys
a = [1, 2, 3]
sys.getrefcount(a) # 2 -- one for `a`, one for the getrefcount() argument itself
b = a # refcount incremented
sys.getrefcount(a) # 3
del b # refcount decremented
sys.getrefcount(a) # back to 2
del a # refcount hits 0 (ignoring the getrefcount call itself) -- freed IMMEDIATELY
Every object has a counter tracking how many references point to it.
Every new reference (assignment, passing as an argument, appending to a
container) increments it; every reference going away (del, reassignment,
falling out of scope) decrements it. The moment it hits zero, CPython
frees the object's memory immediately — deterministically, unlike
generational-only garbage collectors in some other languages.
The gap: reference cycles
class Node:
def __init__(self):
self.parent = None
self.child = None
a = Node()
b = Node()
a.child = b
b.parent = a # a references b, b references a -- a cycle
del a
del b
# a and b's refcounts never reach 0! each still holds one reference
# from the other -- refcounting ALONE can never free this cycle.
Even after both a and b go out of scope from the program's
perspective, they still reference each other, so neither refcount ever
reaches zero through refcounting alone — this is exactly the gap the
cyclic garbage collector exists to close.
The generational cyclic GC: the secondary mechanism
import gc
gc.collect() # force a collection cycle
gc.get_stats() # per-generation collection stats
CPython periodically runs a generational mark-and-sweep-style collector (three generations: 0, 1, 2) that specifically looks for groups of objects that reference each other but are unreachable from anything else in the program, and frees them together. New objects start in generation 0; objects that survive a collection are promoted to older generations, which are scanned less frequently (since long-lived objects are statistically less likely to become garbage soon) — this generational strategy keeps the overhead of cycle detection low for typical workloads.
Why this two-tier design
Reference counting alone is simple and gives instant, deterministic cleanup for the overwhelming majority of objects (no cycles involved), but can't handle cycles. A purely generational/tracing collector (like many other managed-memory languages use) can handle cycles but gives up deterministic, immediate cleanup. CPython's design gets the best of both: immediate cleanup for the common case, periodic cycle detection as a backstop for the rest.
Interview-ready summary: CPython frees most objects immediately via reference counting the moment their refcount hits zero; a supplementary generational garbage collector runs periodically to detect and free reference cycles that counting alone can never resolve, since a cycle's member objects each keep the others' refcounts above zero indefinitely.
Related Resources
A classic real-world cycle: parent/child back-references
class TreeNode:
def __init__(self, value):
self.value = value
self.children = []
self.parent = None
def add_child(self, child):
child.parent = self # child references parent
self.children.append(child) # parent references child -- cycle!
root = TreeNode("root")
child = TreeNode("child")
root.add_child(child)
del root
del child
# neither node's refcount reaches 0 -- each still holds a reference to the other
# via .parent / .children, even though nothing external references either anymore
This pattern — a container holding items that reference back to the container — appears constantly in real code (trees, doubly-linked lists, observer patterns, cached objects referencing a registry that references them back) and is the main source of reference cycles in practice.
How the collector finds cycles: reachability from roots
The GC periodically walks the object graph starting from roots
(global variables, active stack frames' local variables, and other
objects the interpreter directly holds), marking everything reachable.
Anything not reachable from a root — even if its internal refcount is
nonzero because of cycle members referencing each other — is garbage and
gets collected. This is conceptually a standard "trial deletion" /
mark-and-sweep-style algorithm applied only to container objects (the GC
only needs to track objects that could participate in cycles — lists,
dicts, class instances — not simple immutable values like int/str).
__del__ and cycles: a historical gotcha
Before Python 3.4, objects with a __del__ method involved in a cycle
could not be collected at all (the collector didn't know a safe order
to call __del__ on cyclic objects, so it left them as permanent
"uncollectable garbage" in gc.garbage). Since Python 3.4 (PEP 442),
the collector can safely collect cycles containing __del__ methods too,
finalizing them in a safe order — this was a real, previously-documented
memory leak source in older Python code that's no longer a concern on
modern versions.
Practical mitigations
import weakref
class TreeNode:
def add_child(self, child):
child.parent = weakref.ref(self) # doesn't keep self alive; breaks the cycle
self.children.append(child)
Using weakref for back-references (parent pointers, cache entries)
avoids creating a cycle in the first place, letting reference counting
alone reclaim the objects immediately rather than waiting for a periodic
GC pass — useful for large numbers of short-lived cyclic structures where
GC pause overhead matters.
Interview-ready summary: Reference cycles are groups of objects that
keep each other's refcounts alive even when unreachable from the rest of
the program — common in parent/child and cache-style back-references. The
generational GC detects them by tracing reachability from program roots,
and (since Python 3.4) can safely collect cycles even when __del__
methods are involved; using weakref for back-references avoids creating
the cycle at all.
Related Resources
Basic usage: a reference that doesn't keep the object alive
import weakref
class Resource:
def __init__(self, name):
self.name = name
r = Resource("db-connection")
weak = weakref.ref(r)
weak() # <Resource object> -- still alive, call it to get the real object (or None)
del r # the only STRONG reference is gone
weak() # None -- the object was actually collected; weakref doesn't keep it alive
Unlike a normal reference (weak = r, which increments the refcount),
weakref.ref(r) doesn't — so it has no say in whether r stays alive.
Calling weak() returns the live object while it exists, or None once
it's actually been collected.
Use case 1: caches that don't prevent eviction
import weakref
_cache = weakref.WeakValueDictionary()
def get_resource(key):
resource = _cache.get(key)
if resource is None:
resource = load_expensive_resource(key)
_cache[key] = resource
return resource
WeakValueDictionary holds weak references to its values — entries are
automatically removed once nothing else references the value. This gives
you caching ("reuse the object if something else is still using it")
without the cache itself becoming the reason large objects never get
freed (a plain dict-based cache would keep every entry alive forever,
a common source of memory leaks in long-running processes).
Use case 2: breaking reference cycles (parent/child back-pointers)
class TreeNode:
def __init__(self, value):
self.value = value
self.children = []
self._parent_ref = None
@property
def parent(self):
return self._parent_ref() if self._parent_ref else None
def add_child(self, child):
child._parent_ref = weakref.ref(self)
self.children.append(child)
The child's reference to its parent doesn't keep the parent alive — only
the parent's children list (a strong reference, appropriately, since a
parent should keep its children alive) does. This avoids creating a cycle
at all, so refcounting alone can free the tree immediately once the root
goes out of scope, rather than waiting for a periodic GC cycle pass.
Use case 3: observer patterns
class EventBus:
def __init__(self):
self._listeners = weakref.WeakSet()
def subscribe(self, listener):
self._listeners.add(listener)
Listeners registered with the bus don't have their lifetime extended just
because they subscribed — if a listener object is otherwise no longer
needed, it can still be garbage collected, and the WeakSet
automatically drops the now-dead reference rather than leaking it
indefinitely.
The limitation: not every object supports weak references
weakref.ref(42) # TypeError: cannot create weak reference to 'int' object
Some built-in types (int, str, tuple in some cases) don't support
weak references directly without wrapping — mainly relevant for custom
classes (which support it by default unless __slots__ excludes
__weakref__) and specific container types designed for this purpose
(WeakValueDictionary, WeakKeyDictionary, WeakSet).
Interview-ready summary: weakref creates references that don't
extend an object's lifetime, used for caches that shouldn't prevent
eviction (WeakValueDictionary) and for breaking reference cycles in
back-pointer relationships (parent/child, observer patterns) — letting
reference counting reclaim memory immediately instead of relying on the
periodic cyclic garbage collector.
Related Resources
cProfile: whole-program, function-level profiling
import cProfile
import pstats
def slow_function():
return sum(i * i for i in range(10**6))
cProfile.run("slow_function()", "output.prof")
stats = pstats.Stats("output.prof")
stats.sort_stats("cumulative").print_stats(10) # top 10 by cumulative time
python -m cProfile -s cumulative my_script.py # profile a whole script from the CLI
cProfile instruments every function call, reporting ncalls (call
count), tottime (time in the function itself, excluding sub-calls), and
cumtime (time including everything it called) — the standard first
step for "where is my program actually spending time," which is often
surprising compared to where you assumed the bottleneck was.
timeit: precise micro-benchmarks
import timeit
timeit.timeit("[x**2 for x in range(1000)]", number=10000)
timeit.timeit("list(map(lambda x: x**2, range(1000)))", number=10000)
python -m timeit -s "data = list(range(1000))" "sorted(data)"
timeit runs a snippet many times in a controlled environment (disabling
the garbage collector during timing by default, to avoid GC pauses
skewing results), giving a reliable comparison between two small
alternative implementations — the right tool for "is approach A or B
faster," as opposed to cProfile's "where does the whole program's time
go."
Memory profiling: tracemalloc (built-in)
import tracemalloc
tracemalloc.start()
run_program()
snapshot = tracemalloc.take_snapshot()
for stat in snapshot.statistics("lineno")[:10]:
print(stat) # top 10 lines by memory allocated, with file/line info
tracemalloc (built into the standard library since 3.4) tracks memory
allocations by source location, letting you find exactly which lines are
responsible for the most memory use — invaluable for tracking down
unexpected memory growth or leaks in long-running processes.
Line-level profiling: line_profiler (third-party)
# pip install line_profiler
@profile # applied via `kernprof -l script.py`, not a normal decorator
def slow_function():
...
When cProfile identifies a hot function but you need to know which
specific line inside it is slow, line_profiler gives per-line timing
— more granular than cProfile's per-function view, at the cost of
needing a separate tool and higher overhead while running.
The discipline: measure before optimizing
The universal rule across all of this: profile first, then optimize the actual bottleneck — intuitions about "the slow part" are frequently wrong (a startup-time import, an accidentally-quadratic loop, or excessive small allocations often dominate over the code a developer assumed was slow), and optimizing the wrong part wastes effort while adding complexity for no measured benefit.
Interview-ready summary: cProfile finds which functions dominate a
whole program's runtime; timeit precisely compares small snippets;
tracemalloc/memory_profiler locate where memory is actually
allocated. The overarching principle is to profile first — intuition
about bottlenecks is unreliable, and optimization effort should follow
measured data, not guesses.
Related Resources
Cause 1: unbounded caches and module-level collections
_cache = {}
def get_data(key):
if key not in _cache:
_cache[key] = expensive_computation(key)
return _cache[key] # _cache grows forever -- every unique key leaks memory permanently
The single most common "leak" in Python is entirely mundane: a global
dict/list that accumulates entries with no eviction policy. This isn't a
GC bug at all — the objects are genuinely still reachable (via _cache),
so it's working exactly as designed; the fix is a bounded cache
(functools.lru_cache(maxsize=...), a WeakValueDictionary, or explicit
TTL/eviction logic).
Cause 2: registered callbacks/listeners never unregistered
class EventBus:
def __init__(self):
self.listeners = []
def subscribe(self, callback):
self.listeners.append(callback) # strong reference -- keeps callback's owner alive!
bus = EventBus()
class Widget:
def __init__(self, bus):
bus.subscribe(self.on_event) # bus now holds a reference to this Widget forever
def on_event(self, event):
...
Every Widget that subscribes is kept alive by the bus indefinitely,
even after the code that created it no longer needs it — since bound
methods (self.on_event) carry a reference to self. Fixes: explicit
unsubscribe() calls at the end of an object's lifecycle, or storing
listeners as weakref.WeakMethod/in a WeakSet so subscribing doesn't
extend the subscriber's lifetime.
Cause 3: reference cycles with C extension objects
Pure-Python reference cycles are eventually collected by the cyclic GC (see the reference-cycles question), but cycles that include objects managed partly by a C extension that doesn't fully cooperate with Python's GC protocol (rare with well-behaved extensions, but a known historical source of leaks in some older/poorly-written bindings) can sometimes never be collected — worth knowing as a "when all else fails, suspect the C extension" debugging lead.
Cause 4: closures/threads unintentionally keeping large objects alive
def process_large_dataset():
data = load_huge_dataset() # large object
def callback():
return len(data) # closure captures `data`, keeping it alive
register_callback(callback) # callback (and therefore `data`) outlives this function
return "done" # caller assumes `data` is now freeable -- it isn't!
A closure captures whatever variables it references from its enclosing scope — if that closure is stored somewhere long-lived (a callback registry, a class attribute), it keeps everything it captured alive too, even data the closure barely uses.
Diagnosing a suspected leak
import gc, tracemalloc
tracemalloc.start()
# ... run workload, take snapshots at intervals ...
snapshot1 = tracemalloc.take_snapshot()
# ... more workload ...
snapshot2 = tracemalloc.take_snapshot()
for stat in snapshot2.compare_to(snapshot1, "lineno")[:10]:
print(stat) # which lines allocated the most NEW memory between snapshots
gc.collect()
len(gc.garbage) # non-empty -- objects the GC found uncollectable (rare on modern Python)
Comparing tracemalloc snapshots over time pinpoints exactly which
allocation sites are growing unboundedly, which is almost always more
productive than guessing.
Interview-ready summary: Most "memory leaks" in Python aren't GC
failures at all — they're objects that are still genuinely reachable:
unbounded caches, forgotten event-listener registrations, and closures
capturing large objects longer than intended. Diagnose with tracemalloc
snapshot comparisons rather than assuming the garbage collector itself is
at fault.