What are common causes of memory leaks in long-running Python processes?

7 minadvancedmemoryleaksproduction

Quick Answer

Despite having garbage collection, Python processes can still leak memory: unbounded caches/global collections that grow forever, reference cycles involving objects with problematic `__del__` methods (rare on modern Python, but still possible to stall), circular references combined with C-extension objects that don't fully participate in Python's GC, event listeners/callbacks that are registered but never unregistered, and simply holding onto large objects (e.g., in a closure or a module-level list) longer than needed.

Detailed Answer

Cause 1: unbounded caches and module-level collections

_cache = {}

def get_data(key):
    if key not in _cache:
        _cache[key] = expensive_computation(key)
    return _cache[key]     # _cache grows forever -- every unique key leaks memory permanently

The single most common "leak" in Python is entirely mundane: a global dict/list that accumulates entries with no eviction policy. This isn't a GC bug at all — the objects are genuinely still reachable (via _cache), so it's working exactly as designed; the fix is a bounded cache (functools.lru_cache(maxsize=...), a WeakValueDictionary, or explicit TTL/eviction logic).

Cause 2: registered callbacks/listeners never unregistered

class EventBus:
    def __init__(self):
        self.listeners = []

    def subscribe(self, callback):
        self.listeners.append(callback)   # strong reference -- keeps callback's owner alive!

bus = EventBus()

class Widget:
    def __init__(self, bus):
        bus.subscribe(self.on_event)   # bus now holds a reference to this Widget forever

    def on_event(self, event):
        ...

Every Widget that subscribes is kept alive by the bus indefinitely, even after the code that created it no longer needs it — since bound methods (self.on_event) carry a reference to self. Fixes: explicit unsubscribe() calls at the end of an object's lifecycle, or storing listeners as weakref.WeakMethod/in a WeakSet so subscribing doesn't extend the subscriber's lifetime.

Cause 3: reference cycles with C extension objects

Pure-Python reference cycles are eventually collected by the cyclic GC (see the reference-cycles question), but cycles that include objects managed partly by a C extension that doesn't fully cooperate with Python's GC protocol (rare with well-behaved extensions, but a known historical source of leaks in some older/poorly-written bindings) can sometimes never be collected — worth knowing as a "when all else fails, suspect the C extension" debugging lead.

Cause 4: closures/threads unintentionally keeping large objects alive

def process_large_dataset():
    data = load_huge_dataset()          # large object
    def callback():
        return len(data)                  # closure captures `data`, keeping it alive
    register_callback(callback)             # callback (and therefore `data`) outlives this function
    return "done"                           # caller assumes `data` is now freeable -- it isn't!

A closure captures whatever variables it references from its enclosing scope — if that closure is stored somewhere long-lived (a callback registry, a class attribute), it keeps everything it captured alive too, even data the closure barely uses.

Diagnosing a suspected leak

import gc, tracemalloc

tracemalloc.start()
# ... run workload, take snapshots at intervals ...
snapshot1 = tracemalloc.take_snapshot()
# ... more workload ...
snapshot2 = tracemalloc.take_snapshot()
for stat in snapshot2.compare_to(snapshot1, "lineno")[:10]:
    print(stat)   # which lines allocated the most NEW memory between snapshots

gc.collect()
len(gc.garbage)   # non-empty -- objects the GC found uncollectable (rare on modern Python)

Comparing tracemalloc snapshots over time pinpoints exactly which allocation sites are growing unboundedly, which is almost always more productive than guessing.

Interview-ready summary: Most "memory leaks" in Python aren't GC failures at all — they're objects that are still genuinely reachable: unbounded caches, forgotten event-listener registrations, and closures capturing large objects longer than intended. Diagnose with tracemalloc snapshot comparisons rather than assuming the garbage collector itself is at fault.