What is a recursive CTE, and when would you use one?
Quick Answer
A recursive CTE (`WITH RECURSIVE`) lets a query reference itself, repeatedly building on prior results until no new rows are produced — the standard way to traverse hierarchical or graph-like data (org charts, category trees, bill-of-materials, dependency graphs) that a fixed number of joins can't handle because the depth is unknown or variable.
Detailed Answer
The problem: unknown-depth hierarchies
-- employees: id, name, manager_id
Finding "all of Alice's direct reports" is a simple join. Finding "everyone in Alice's entire management chain below her, at any depth" can't be expressed with a fixed number of joins, because the org chart's depth varies and isn't known in advance.
Anatomy of a recursive CTE
WITH RECURSIVE subordinates AS (
-- Base case (anchor member): the starting row(s)
SELECT id, name, manager_id, 1 AS depth
FROM employees
WHERE id = 1 -- Alice's id
UNION ALL
-- Recursive member: references the CTE itself
SELECT e.id, e.name, e.manager_id, s.depth + 1
FROM employees e
JOIN subordinates s ON e.manager_id = s.id
)
SELECT * FROM subordinates;
How it executes conceptually:
- Run the base case (anchor) once — this seeds the initial working set (Alice herself).
- Run the recursive member using only the newest rows added in the previous iteration, producing a new batch of rows (Alice's direct reports).
- Repeat step 2, each time joining against only the previous iteration's new rows, until an iteration produces zero new rows — then stop.
UNION ALLall iterations' results together as the final output.
Termination
The recursion must eventually stop producing new rows, or it runs forever (or until an engine-enforced recursion limit is hit — PostgreSQL doesn't have a hard limit by default and will happily loop indefinitely on a cyclic graph without one; SQL Server defaults to a 100-level MAXRECURSION limit specifically to guard against this). For genuinely cyclic data (e.g., a graph where cycles are possible, unlike a strict tree), you must explicitly track visited nodes and exclude them to avoid infinite recursion:
WITH RECURSIVE paths AS (
SELECT start_node, end_node, ARRAY[start_node] AS visited
FROM edges WHERE start_node = 'A'
UNION ALL
SELECT e.start_node, e.end_node, p.visited || e.end_node
FROM edges e
JOIN paths p ON e.start_node = p.end_node
WHERE NOT e.end_node = ANY(p.visited) -- prevents revisiting a node, avoiding infinite loops
)
SELECT * FROM paths;
Common use cases
- Org charts / management chains (as above).
- Category/product hierarchies (find all subcategories under "Electronics," arbitrarily nested).
- Bill-of-materials explosions (a product made of sub-assemblies, made of sub-sub-assemblies...).
- Graph traversal — shortest/all paths between two nodes, dependency resolution.
Performance note
Recursive CTEs can be slow on deep or wide hierarchies since each level is a fresh join pass; for read-heavy, rarely-changing hierarchies, a materialized path or closure table (precomputed ancestor/descendant pairs, maintained on write) is a common denormalization that trades write complexity for much faster read queries — worth mentioning as the production alternative when a recursive CTE becomes a bottleneck.