What is an index, and how does a B-tree index work under the hood?
Quick Answer
An index is a separate, ordered data structure that lets the database find rows without scanning the whole table. A **B-tree** (balanced tree) index stores keys in sorted order across a shallow, wide tree of pages: internal nodes hold routing keys, and leaf nodes hold the indexed value plus a pointer to the actual row (or the row itself, for a clustered index). Lookups, range scans, and sorted retrieval are all O(log n) against the tree's height, which stays small (typically 3-4 levels) even for tables with millions of rows.
Detailed Answer
Without an index, finding a row requires a full table scan — reading every page of the table to check each row against the condition, which is O(n).
What an index actually is
An index is a redundant, auxiliary copy of one or more columns' worth of data, stored in a structure optimized for searching — most commonly a B-tree (or its variant, the B+tree, which most real databases actually use). Maintaining it costs extra storage and slows down writes slightly (every INSERT/UPDATE/DELETE must also update every affected index), in exchange for much faster reads on the indexed column(s).
B-tree structure
[ 50 ]
/ \
[ 20, 35 ] [ 70, 90 ]
/ | \ / | \
leaf leaf leaf leaf leaf leaf <- sorted, linked leaf pages
(actual key values + row pointers)
- Internal (non-leaf) nodes hold routing keys used purely to decide which child to descend into — they don't necessarily correspond to real rows.
- Leaf nodes hold the actual indexed key values in sorted order, each paired with a pointer to the row's physical location (a "row ID"/"ctid"/"RID," depending on engine) — or, for a clustered index, the leaf is the row itself.
- Leaf pages are typically linked together in a doubly-linked list, so once you've found your starting point, a range scan (
BETWEEN,>,<,ORDER BY) can walk forward/backward through sorted leaves without re-descending the tree each time.
Why B-trees specifically (not a plain binary tree)
A B-tree is wide and shallow rather than narrow and deep — each node holds many keys (often hundreds, sized to match a disk page), so even a huge table needs only 3-4 levels of tree traversal to reach any leaf. This matters because each level touched is potentially a disk I/O (or at least a cache-line/page fetch); minimizing tree height directly minimizes I/O for a lookup. A plain binary tree, by contrast, would need log₂(n) levels — vastly more for large n — because each node holds only one key.
What a B-tree index is good at, and not good at
- Excellent for: equality lookups (
WHERE id = 5), range queries (WHERE price BETWEEN 10 AND 50), sorted retrieval (ORDER BY indexed_col), and prefix matching (WHERE name LIKE 'Smith%'). - Not useful for: conditions that don't preserve the sorted-order relationship, like
WHERE name LIKE '%Smith'(leading wildcard) or applying a function to the indexed column (WHERE UPPER(email) = ...) unless a matching functional/expression index exists — see the sargability question.
Alternatives worth knowing exist
Hash indexes (equality-only, no range support, no sort order — see the hash-vs-B-tree question), GiST/GIN indexes (full-text search, geometric data, JSONB containment in PostgreSQL), and bitmap indexes (low-cardinality columns in analytical/OLAP systems) all trade the B-tree's general-purpose versatility for better performance on a narrower class of queries.