What is a recursive CTE, and when would you use one?

7 minadvancedrecursive-ctehierarchical-datacte

Quick Answer

A recursive CTE (`WITH RECURSIVE`) lets a query reference itself, repeatedly building on prior results until no new rows are produced — the standard way to traverse hierarchical or graph-like data (org charts, category trees, bill-of-materials, dependency graphs) that a fixed number of joins can't handle because the depth is unknown or variable.

Detailed Answer

The problem: unknown-depth hierarchies

-- employees: id, name, manager_id

Finding "all of Alice's direct reports" is a simple join. Finding "everyone in Alice's entire management chain below her, at any depth" can't be expressed with a fixed number of joins, because the org chart's depth varies and isn't known in advance.

Anatomy of a recursive CTE

WITH RECURSIVE subordinates AS (
    -- Base case (anchor member): the starting row(s)
    SELECT id, name, manager_id, 1 AS depth
    FROM employees
    WHERE id = 1   -- Alice's id

    UNION ALL

    -- Recursive member: references the CTE itself
    SELECT e.id, e.name, e.manager_id, s.depth + 1
    FROM employees e
    JOIN subordinates s ON e.manager_id = s.id
)
SELECT * FROM subordinates;

How it executes conceptually:

  1. Run the base case (anchor) once — this seeds the initial working set (Alice herself).
  2. Run the recursive member using only the newest rows added in the previous iteration, producing a new batch of rows (Alice's direct reports).
  3. Repeat step 2, each time joining against only the previous iteration's new rows, until an iteration produces zero new rows — then stop.
  4. UNION ALL all iterations' results together as the final output.

Termination

The recursion must eventually stop producing new rows, or it runs forever (or until an engine-enforced recursion limit is hit — PostgreSQL doesn't have a hard limit by default and will happily loop indefinitely on a cyclic graph without one; SQL Server defaults to a 100-level MAXRECURSION limit specifically to guard against this). For genuinely cyclic data (e.g., a graph where cycles are possible, unlike a strict tree), you must explicitly track visited nodes and exclude them to avoid infinite recursion:

WITH RECURSIVE paths AS (
    SELECT start_node, end_node, ARRAY[start_node] AS visited
    FROM edges WHERE start_node = 'A'

    UNION ALL

    SELECT e.start_node, e.end_node, p.visited || e.end_node
    FROM edges e
    JOIN paths p ON e.start_node = p.end_node
    WHERE NOT e.end_node = ANY(p.visited)   -- prevents revisiting a node, avoiding infinite loops
)
SELECT * FROM paths;

Common use cases

  • Org charts / management chains (as above).
  • Category/product hierarchies (find all subcategories under "Electronics," arbitrarily nested).
  • Bill-of-materials explosions (a product made of sub-assemblies, made of sub-sub-assemblies...).
  • Graph traversal — shortest/all paths between two nodes, dependency resolution.

Performance note

Recursive CTEs can be slow on deep or wide hierarchies since each level is a fresh join pass; for read-heavy, rarely-changing hierarchies, a materialized path or closure table (precomputed ancestor/descendant pairs, maintained on write) is a common denormalization that trades write complexity for much faster read queries — worth mentioning as the production alternative when a recursive CTE becomes a bottleneck.