What is a covering index, and how does it avoid a key lookup?

Detailed Answer

The problem a covering index solves

CREATE INDEX ix_orders_customer_id ON orders(customer_id);

SELECT id, customer_id, total FROM orders WHERE customer_id = 42;

The index on customer_id quickly finds matching rows, but total isn't in that index — so for every matching row, the engine must do an extra key/bookmark lookup back to the full table (or clustered index) just to fetch total. On a query returning many rows, that's many extra random I/Os.

Making the index cover the query

-- Option A: add 'total' as a regular index column
CREATE INDEX ix_orders_customer_covering ON orders(customer_id, total);

-- Option B (SQL Server): INCLUDE non-key columns at the leaf level only
CREATE INDEX ix_orders_customer_covering ON orders(customer_id) INCLUDE (total);

-- Option B (PostgreSQL): the equivalent is INCLUDE in CREATE INDEX (v11+)
CREATE INDEX ix_orders_customer_covering ON orders(customer_id) INCLUDE (total);

Now the same query can be answered entirely from the index's leaf pages — an index-only scan — with no trip to the base table at all, because every column the query needs (customer_id to filter, total to return; id is implicitly available via the clustering key) is present in the index.

Key columns vs INCLUDE(d)/STORING columns

Putting total directly in the index key (Option A) makes it part of the sort order too, which enlarges the key, affects ORDER BY/range-scan usefulness, and duplicates it into every level of the B-tree. An INCLUDE/STORING clause (Option B) stores the extra column only at the leaf level, not in internal nodes, and doesn't affect sort order — generally the more efficient choice when the extra column is only needed for the SELECT list, not for filtering or sorting.

Caveats

Covering indexes trade write cost and storage for read speed — every additional column stored means more data duplicated and updated on every write. Don't cover every possible query; reserve this for genuinely hot, high-value queries.
SELECT * defeats covering indexes almost by definition — the engine can't predict which columns you'll need in the future, and a covering index for "all columns" is just... the whole table. Covering indexes work best paired with narrow, deliberate SELECT lists.
PostgreSQL's index-only scans additionally require the visibility map to confirm a page's rows are all visible to the current transaction — under heavy write/vacuum churn, PostgreSQL can silently fall back to a regular index scan (with key lookups) even when the index technically covers the query.

What is a covering index, and how does it avoid a key lookup?

Quick Answer

Detailed Answer

The problem a covering index solves

Making the index cover the query

Key columns vs INCLUDE(d)/STORING columns

Caveats

Related Resources

Related Questions