What are reference cycles, and how does the garbage collector detect and break them?
Quick Answer
A reference cycle is a group of objects referencing each other such that no object in the group has a refcount of zero, even though the group as a whole is unreachable from the rest of the program (e.g., a doubly-linked list, or a parent-child relationship with back-references). The generational GC detects cycles by tracing reachability from known "roots" (global/stack references) and collects any group of objects reachable only from within itself, running `__del__` where present before freeing them.
Detailed Answer
A classic real-world cycle: parent/child back-references
class TreeNode:
def __init__(self, value):
self.value = value
self.children = []
self.parent = None
def add_child(self, child):
child.parent = self # child references parent
self.children.append(child) # parent references child -- cycle!
root = TreeNode("root")
child = TreeNode("child")
root.add_child(child)
del root
del child
# neither node's refcount reaches 0 -- each still holds a reference to the other
# via .parent / .children, even though nothing external references either anymore
This pattern — a container holding items that reference back to the container — appears constantly in real code (trees, doubly-linked lists, observer patterns, cached objects referencing a registry that references them back) and is the main source of reference cycles in practice.
How the collector finds cycles: reachability from roots
The GC periodically walks the object graph starting from roots
(global variables, active stack frames' local variables, and other
objects the interpreter directly holds), marking everything reachable.
Anything not reachable from a root — even if its internal refcount is
nonzero because of cycle members referencing each other — is garbage and
gets collected. This is conceptually a standard "trial deletion" /
mark-and-sweep-style algorithm applied only to container objects (the GC
only needs to track objects that could participate in cycles — lists,
dicts, class instances — not simple immutable values like int/str).
__del__ and cycles: a historical gotcha
Before Python 3.4, objects with a __del__ method involved in a cycle
could not be collected at all (the collector didn't know a safe order
to call __del__ on cyclic objects, so it left them as permanent
"uncollectable garbage" in gc.garbage). Since Python 3.4 (PEP 442),
the collector can safely collect cycles containing __del__ methods too,
finalizing them in a safe order — this was a real, previously-documented
memory leak source in older Python code that's no longer a concern on
modern versions.
Practical mitigations
import weakref
class TreeNode:
def add_child(self, child):
child.parent = weakref.ref(self) # doesn't keep self alive; breaks the cycle
self.children.append(child)
Using weakref for back-references (parent pointers, cache entries)
avoids creating a cycle in the first place, letting reference counting
alone reclaim the objects immediately rather than waiting for a periodic
GC pass — useful for large numbers of short-lived cyclic structures where
GC pause overhead matters.
Interview-ready summary: Reference cycles are groups of objects that
keep each other's refcounts alive even when unreachable from the rest of
the program — common in parent/child and cache-style back-references. The
generational GC detects them by tracing reachability from program roots,
and (since Python 3.4) can safely collect cycles even when __del__
methods are involved; using weakref for back-references avoids creating
the cycle at all.