What is an index, and how does a B-tree index work under the hood?

7 minintermediateindexingb-treeinternals

Quick Answer

An index is a separate, ordered data structure that lets the database find rows without scanning the whole table. A **B-tree** (balanced tree) index stores keys in sorted order across a shallow, wide tree of pages: internal nodes hold routing keys, and leaf nodes hold the indexed value plus a pointer to the actual row (or the row itself, for a clustered index). Lookups, range scans, and sorted retrieval are all O(log n) against the tree's height, which stays small (typically 3-4 levels) even for tables with millions of rows.

Detailed Answer

Without an index, finding a row requires a full table scan — reading every page of the table to check each row against the condition, which is O(n).

What an index actually is

An index is a redundant, auxiliary copy of one or more columns' worth of data, stored in a structure optimized for searching — most commonly a B-tree (or its variant, the B+tree, which most real databases actually use). Maintaining it costs extra storage and slows down writes slightly (every INSERT/UPDATE/DELETE must also update every affected index), in exchange for much faster reads on the indexed column(s).

B-tree structure

                    [ 50 ]
                 /          \
          [ 20, 35 ]      [ 70, 90 ]
          /   |   \        /   |   \
      leaf  leaf  leaf  leaf  leaf  leaf   <- sorted, linked leaf pages
       (actual key values + row pointers)
  • Internal (non-leaf) nodes hold routing keys used purely to decide which child to descend into — they don't necessarily correspond to real rows.
  • Leaf nodes hold the actual indexed key values in sorted order, each paired with a pointer to the row's physical location (a "row ID"/"ctid"/"RID," depending on engine) — or, for a clustered index, the leaf is the row itself.
  • Leaf pages are typically linked together in a doubly-linked list, so once you've found your starting point, a range scan (BETWEEN, >, <, ORDER BY) can walk forward/backward through sorted leaves without re-descending the tree each time.

Why B-trees specifically (not a plain binary tree)

A B-tree is wide and shallow rather than narrow and deep — each node holds many keys (often hundreds, sized to match a disk page), so even a huge table needs only 3-4 levels of tree traversal to reach any leaf. This matters because each level touched is potentially a disk I/O (or at least a cache-line/page fetch); minimizing tree height directly minimizes I/O for a lookup. A plain binary tree, by contrast, would need log₂(n) levels — vastly more for large n — because each node holds only one key.

What a B-tree index is good at, and not good at

  • Excellent for: equality lookups (WHERE id = 5), range queries (WHERE price BETWEEN 10 AND 50), sorted retrieval (ORDER BY indexed_col), and prefix matching (WHERE name LIKE 'Smith%').
  • Not useful for: conditions that don't preserve the sorted-order relationship, like WHERE name LIKE '%Smith' (leading wildcard) or applying a function to the indexed column (WHERE UPPER(email) = ...) unless a matching functional/expression index exists — see the sargability question.

Alternatives worth knowing exist

Hash indexes (equality-only, no range support, no sort order — see the hash-vs-B-tree question), GiST/GIN indexes (full-text search, geometric data, JSONB containment in PostgreSQL), and bitmap indexes (low-cardinality columns in analytical/OLAP systems) all trade the B-tree's general-purpose versatility for better performance on a narrower class of queries.