Memory Management & Performance

Difficulty

Reference counting: the primary mechanism

import sys

a = [1, 2, 3]
sys.getrefcount(a)   # 2 -- one for `a`, one for the getrefcount() argument itself

b = a                 # refcount incremented
sys.getrefcount(a)    # 3

del b                  # refcount decremented
sys.getrefcount(a)     # back to 2

del a                   # refcount hits 0 (ignoring the getrefcount call itself) -- freed IMMEDIATELY

Every object has a counter tracking how many references point to it. Every new reference (assignment, passing as an argument, appending to a container) increments it; every reference going away (del, reassignment, falling out of scope) decrements it. The moment it hits zero, CPython frees the object's memory immediately — deterministically, unlike generational-only garbage collectors in some other languages.

The gap: reference cycles

class Node:
    def __init__(self):
        self.parent = None
        self.child = None

a = Node()
b = Node()
a.child = b
b.parent = a          # a references b, b references a -- a cycle

del a
del b
# a and b's refcounts never reach 0! each still holds one reference
# from the other -- refcounting ALONE can never free this cycle.

Even after both a and b go out of scope from the program's perspective, they still reference each other, so neither refcount ever reaches zero through refcounting alone — this is exactly the gap the cyclic garbage collector exists to close.

The generational cyclic GC: the secondary mechanism

import gc

gc.collect()          # force a collection cycle
gc.get_stats()         # per-generation collection stats

CPython periodically runs a generational mark-and-sweep-style collector (three generations: 0, 1, 2) that specifically looks for groups of objects that reference each other but are unreachable from anything else in the program, and frees them together. New objects start in generation 0; objects that survive a collection are promoted to older generations, which are scanned less frequently (since long-lived objects are statistically less likely to become garbage soon) — this generational strategy keeps the overhead of cycle detection low for typical workloads.

Why this two-tier design

Reference counting alone is simple and gives instant, deterministic cleanup for the overwhelming majority of objects (no cycles involved), but can't handle cycles. A purely generational/tracing collector (like many other managed-memory languages use) can handle cycles but gives up deterministic, immediate cleanup. CPython's design gets the best of both: immediate cleanup for the common case, periodic cycle detection as a backstop for the rest.

Interview-ready summary: CPython frees most objects immediately via reference counting the moment their refcount hits zero; a supplementary generational garbage collector runs periodically to detect and free reference cycles that counting alone can never resolve, since a cycle's member objects each keep the others' refcounts above zero indefinitely.

A classic real-world cycle: parent/child back-references

class TreeNode:
    def __init__(self, value):
        self.value = value
        self.children = []
        self.parent = None

    def add_child(self, child):
        child.parent = self       # child references parent
        self.children.append(child)  # parent references child -- cycle!

root = TreeNode("root")
child = TreeNode("child")
root.add_child(child)

del root
del child
# neither node's refcount reaches 0 -- each still holds a reference to the other
# via .parent / .children, even though nothing external references either anymore

This pattern — a container holding items that reference back to the container — appears constantly in real code (trees, doubly-linked lists, observer patterns, cached objects referencing a registry that references them back) and is the main source of reference cycles in practice.

How the collector finds cycles: reachability from roots

The GC periodically walks the object graph starting from roots (global variables, active stack frames' local variables, and other objects the interpreter directly holds), marking everything reachable. Anything not reachable from a root — even if its internal refcount is nonzero because of cycle members referencing each other — is garbage and gets collected. This is conceptually a standard "trial deletion" / mark-and-sweep-style algorithm applied only to container objects (the GC only needs to track objects that could participate in cycles — lists, dicts, class instances — not simple immutable values like int/str).

__del__ and cycles: a historical gotcha

Before Python 3.4, objects with a __del__ method involved in a cycle could not be collected at all (the collector didn't know a safe order to call __del__ on cyclic objects, so it left them as permanent "uncollectable garbage" in gc.garbage). Since Python 3.4 (PEP 442), the collector can safely collect cycles containing __del__ methods too, finalizing them in a safe order — this was a real, previously-documented memory leak source in older Python code that's no longer a concern on modern versions.

Practical mitigations

import weakref

class TreeNode:
    def add_child(self, child):
        child.parent = weakref.ref(self)   # doesn't keep self alive; breaks the cycle
        self.children.append(child)

Using weakref for back-references (parent pointers, cache entries) avoids creating a cycle in the first place, letting reference counting alone reclaim the objects immediately rather than waiting for a periodic GC pass — useful for large numbers of short-lived cyclic structures where GC pause overhead matters.

Interview-ready summary: Reference cycles are groups of objects that keep each other's refcounts alive even when unreachable from the rest of the program — common in parent/child and cache-style back-references. The generational GC detects them by tracing reachability from program roots, and (since Python 3.4) can safely collect cycles even when __del__ methods are involved; using weakref for back-references avoids creating the cycle at all.

Basic usage: a reference that doesn't keep the object alive

import weakref

class Resource:
    def __init__(self, name):
        self.name = name

r = Resource("db-connection")
weak = weakref.ref(r)

weak()          # <Resource object> -- still alive, call it to get the real object (or None)

del r            # the only STRONG reference is gone
weak()            # None -- the object was actually collected; weakref doesn't keep it alive

Unlike a normal reference (weak = r, which increments the refcount), weakref.ref(r) doesn't — so it has no say in whether r stays alive. Calling weak() returns the live object while it exists, or None once it's actually been collected.

Use case 1: caches that don't prevent eviction

import weakref

_cache = weakref.WeakValueDictionary()

def get_resource(key):
    resource = _cache.get(key)
    if resource is None:
        resource = load_expensive_resource(key)
        _cache[key] = resource
    return resource

WeakValueDictionary holds weak references to its values — entries are automatically removed once nothing else references the value. This gives you caching ("reuse the object if something else is still using it") without the cache itself becoming the reason large objects never get freed (a plain dict-based cache would keep every entry alive forever, a common source of memory leaks in long-running processes).

Use case 2: breaking reference cycles (parent/child back-pointers)

class TreeNode:
    def __init__(self, value):
        self.value = value
        self.children = []
        self._parent_ref = None

    @property
    def parent(self):
        return self._parent_ref() if self._parent_ref else None

    def add_child(self, child):
        child._parent_ref = weakref.ref(self)
        self.children.append(child)

The child's reference to its parent doesn't keep the parent alive — only the parent's children list (a strong reference, appropriately, since a parent should keep its children alive) does. This avoids creating a cycle at all, so refcounting alone can free the tree immediately once the root goes out of scope, rather than waiting for a periodic GC cycle pass.

Use case 3: observer patterns

class EventBus:
    def __init__(self):
        self._listeners = weakref.WeakSet()

    def subscribe(self, listener):
        self._listeners.add(listener)

Listeners registered with the bus don't have their lifetime extended just because they subscribed — if a listener object is otherwise no longer needed, it can still be garbage collected, and the WeakSet automatically drops the now-dead reference rather than leaking it indefinitely.

The limitation: not every object supports weak references

weakref.ref(42)   # TypeError: cannot create weak reference to 'int' object

Some built-in types (int, str, tuple in some cases) don't support weak references directly without wrapping — mainly relevant for custom classes (which support it by default unless __slots__ excludes __weakref__) and specific container types designed for this purpose (WeakValueDictionary, WeakKeyDictionary, WeakSet).

Interview-ready summary: weakref creates references that don't extend an object's lifetime, used for caches that shouldn't prevent eviction (WeakValueDictionary) and for breaking reference cycles in back-pointer relationships (parent/child, observer patterns) — letting reference counting reclaim memory immediately instead of relying on the periodic cyclic garbage collector.

Related Resources

cProfile: whole-program, function-level profiling

import cProfile
import pstats

def slow_function():
    return sum(i * i for i in range(10**6))

cProfile.run("slow_function()", "output.prof")

stats = pstats.Stats("output.prof")
stats.sort_stats("cumulative").print_stats(10)   # top 10 by cumulative time
python -m cProfile -s cumulative my_script.py   # profile a whole script from the CLI

cProfile instruments every function call, reporting ncalls (call count), tottime (time in the function itself, excluding sub-calls), and cumtime (time including everything it called) — the standard first step for "where is my program actually spending time," which is often surprising compared to where you assumed the bottleneck was.

timeit: precise micro-benchmarks

import timeit

timeit.timeit("[x**2 for x in range(1000)]", number=10000)
timeit.timeit("list(map(lambda x: x**2, range(1000)))", number=10000)
python -m timeit -s "data = list(range(1000))" "sorted(data)"

timeit runs a snippet many times in a controlled environment (disabling the garbage collector during timing by default, to avoid GC pauses skewing results), giving a reliable comparison between two small alternative implementations — the right tool for "is approach A or B faster," as opposed to cProfile's "where does the whole program's time go."

Memory profiling: tracemalloc (built-in)

import tracemalloc

tracemalloc.start()
run_program()
snapshot = tracemalloc.take_snapshot()

for stat in snapshot.statistics("lineno")[:10]:
    print(stat)   # top 10 lines by memory allocated, with file/line info

tracemalloc (built into the standard library since 3.4) tracks memory allocations by source location, letting you find exactly which lines are responsible for the most memory use — invaluable for tracking down unexpected memory growth or leaks in long-running processes.

Line-level profiling: line_profiler (third-party)

# pip install line_profiler
@profile   # applied via `kernprof -l script.py`, not a normal decorator
def slow_function():
    ...

When cProfile identifies a hot function but you need to know which specific line inside it is slow, line_profiler gives per-line timing — more granular than cProfile's per-function view, at the cost of needing a separate tool and higher overhead while running.

The discipline: measure before optimizing

The universal rule across all of this: profile first, then optimize the actual bottleneck — intuitions about "the slow part" are frequently wrong (a startup-time import, an accidentally-quadratic loop, or excessive small allocations often dominate over the code a developer assumed was slow), and optimizing the wrong part wastes effort while adding complexity for no measured benefit.

Interview-ready summary: cProfile finds which functions dominate a whole program's runtime; timeit precisely compares small snippets; tracemalloc/memory_profiler locate where memory is actually allocated. The overarching principle is to profile first — intuition about bottlenecks is unreliable, and optimization effort should follow measured data, not guesses.

Cause 1: unbounded caches and module-level collections

_cache = {}

def get_data(key):
    if key not in _cache:
        _cache[key] = expensive_computation(key)
    return _cache[key]     # _cache grows forever -- every unique key leaks memory permanently

The single most common "leak" in Python is entirely mundane: a global dict/list that accumulates entries with no eviction policy. This isn't a GC bug at all — the objects are genuinely still reachable (via _cache), so it's working exactly as designed; the fix is a bounded cache (functools.lru_cache(maxsize=...), a WeakValueDictionary, or explicit TTL/eviction logic).

Cause 2: registered callbacks/listeners never unregistered

class EventBus:
    def __init__(self):
        self.listeners = []

    def subscribe(self, callback):
        self.listeners.append(callback)   # strong reference -- keeps callback's owner alive!

bus = EventBus()

class Widget:
    def __init__(self, bus):
        bus.subscribe(self.on_event)   # bus now holds a reference to this Widget forever

    def on_event(self, event):
        ...

Every Widget that subscribes is kept alive by the bus indefinitely, even after the code that created it no longer needs it — since bound methods (self.on_event) carry a reference to self. Fixes: explicit unsubscribe() calls at the end of an object's lifecycle, or storing listeners as weakref.WeakMethod/in a WeakSet so subscribing doesn't extend the subscriber's lifetime.

Cause 3: reference cycles with C extension objects

Pure-Python reference cycles are eventually collected by the cyclic GC (see the reference-cycles question), but cycles that include objects managed partly by a C extension that doesn't fully cooperate with Python's GC protocol (rare with well-behaved extensions, but a known historical source of leaks in some older/poorly-written bindings) can sometimes never be collected — worth knowing as a "when all else fails, suspect the C extension" debugging lead.

Cause 4: closures/threads unintentionally keeping large objects alive

def process_large_dataset():
    data = load_huge_dataset()          # large object
    def callback():
        return len(data)                  # closure captures `data`, keeping it alive
    register_callback(callback)             # callback (and therefore `data`) outlives this function
    return "done"                           # caller assumes `data` is now freeable -- it isn't!

A closure captures whatever variables it references from its enclosing scope — if that closure is stored somewhere long-lived (a callback registry, a class attribute), it keeps everything it captured alive too, even data the closure barely uses.

Diagnosing a suspected leak

import gc, tracemalloc

tracemalloc.start()
# ... run workload, take snapshots at intervals ...
snapshot1 = tracemalloc.take_snapshot()
# ... more workload ...
snapshot2 = tracemalloc.take_snapshot()
for stat in snapshot2.compare_to(snapshot1, "lineno")[:10]:
    print(stat)   # which lines allocated the most NEW memory between snapshots

gc.collect()
len(gc.garbage)   # non-empty -- objects the GC found uncollectable (rare on modern Python)

Comparing tracemalloc snapshots over time pinpoints exactly which allocation sites are growing unboundedly, which is almost always more productive than guessing.

Interview-ready summary: Most "memory leaks" in Python aren't GC failures at all — they're objects that are still genuinely reachable: unbounded caches, forgotten event-listener registrations, and closures capturing large objects longer than intended. Diagnose with tracemalloc snapshot comparisons rather than assuming the garbage collector itself is at fault.