How does Python manage memory (reference counting and the garbage collector)?

7 minadvancedmemorygarbage-collectionreference-counting

Quick Answer

CPython's primary memory management is **reference counting**: every object tracks how many references point to it, and is freed immediately when that count hits zero. This can't reclaim **reference cycles** (objects referencing each other in a loop), so CPython adds a secondary **generational cyclic garbage collector** that periodically scans for and collects unreachable cycles the refcounter alone would miss.

Detailed Answer

Reference counting: the primary mechanism

import sys

a = [1, 2, 3]
sys.getrefcount(a)   # 2 -- one for `a`, one for the getrefcount() argument itself

b = a                 # refcount incremented
sys.getrefcount(a)    # 3

del b                  # refcount decremented
sys.getrefcount(a)     # back to 2

del a                   # refcount hits 0 (ignoring the getrefcount call itself) -- freed IMMEDIATELY

Every object has a counter tracking how many references point to it. Every new reference (assignment, passing as an argument, appending to a container) increments it; every reference going away (del, reassignment, falling out of scope) decrements it. The moment it hits zero, CPython frees the object's memory immediately — deterministically, unlike generational-only garbage collectors in some other languages.

The gap: reference cycles

class Node:
    def __init__(self):
        self.parent = None
        self.child = None

a = Node()
b = Node()
a.child = b
b.parent = a          # a references b, b references a -- a cycle

del a
del b
# a and b's refcounts never reach 0! each still holds one reference
# from the other -- refcounting ALONE can never free this cycle.

Even after both a and b go out of scope from the program's perspective, they still reference each other, so neither refcount ever reaches zero through refcounting alone — this is exactly the gap the cyclic garbage collector exists to close.

The generational cyclic GC: the secondary mechanism

import gc

gc.collect()          # force a collection cycle
gc.get_stats()         # per-generation collection stats

CPython periodically runs a generational mark-and-sweep-style collector (three generations: 0, 1, 2) that specifically looks for groups of objects that reference each other but are unreachable from anything else in the program, and frees them together. New objects start in generation 0; objects that survive a collection are promoted to older generations, which are scanned less frequently (since long-lived objects are statistically less likely to become garbage soon) — this generational strategy keeps the overhead of cycle detection low for typical workloads.

Why this two-tier design

Reference counting alone is simple and gives instant, deterministic cleanup for the overwhelming majority of objects (no cycles involved), but can't handle cycles. A purely generational/tracing collector (like many other managed-memory languages use) can handle cycles but gives up deterministic, immediate cleanup. CPython's design gets the best of both: immediate cleanup for the common case, periodic cycle detection as a backstop for the rest.

Interview-ready summary: CPython frees most objects immediately via reference counting the moment their refcount hits zero; a supplementary generational garbage collector runs periodically to detect and free reference cycles that counting alone can never resolve, since a cycle's member objects each keep the others' refcounts above zero indefinitely.