What's the difference between a clustered and a non-clustered index?

6 minintermediateindexingclustered-indexinternals

Quick Answer

A **clustered index** determines the actual physical storage order of table rows — the leaf level of the index *is* the data itself, so a table can have at most one. A **non-clustered (secondary) index** is a separate structure whose leaves store a pointer back to the row's location (or, in engines like InnoDB, the clustering key), rather than the row's data itself — a table can have many.

Detailed Answer

Clustered index — defines physical row order

The table's rows are physically stored sorted by the clustered index's key. There can be only one per table, because rows can only be physically ordered one way.

-- SQL Server: explicit clustered index, often on the primary key by default
CREATE CLUSTERED INDEX ix_orders_id ON orders(id);

When you look up by the clustered key, the engine finds the leaf page and that page is the full row — no second lookup needed.

MySQL/InnoDB specifics: every InnoDB table always has a clustered index — if you declare a PRIMARY KEY, that becomes the clustered index; if you don't, InnoDB creates a hidden 6-byte row ID and clusters on that internally. This is unlike SQL Server, where a table can be a "heap" with no clustered index at all.

PostgreSQL specifics: PostgreSQL doesn't maintain clustering automatically — CLUSTER table USING index physically reorders the table once, but subsequent inserts don't preserve that order unless you re-run CLUSTER. So PostgreSQL tables are effectively heap-organized by default, and "clustered index" in the SQL Server/MySQL sense doesn't map directly.

Non-clustered (secondary) index — a separate lookup structure

CREATE INDEX ix_orders_customer_id ON orders(customer_id);

This index's leaf nodes store customer_id values in sorted order, each paired with a pointer to the actual row — in SQL Server, a row locator (physical page/slot); in InnoDB, the value of the clustered key (id), since InnoDB's secondary indexes always store the primary key rather than a physical address (so that the clustered index can be reorganized without invalidating every secondary index).

The "bookmark lookup" / "key lookup" cost

Querying by a non-clustered index column, but selecting other columns not in that index, requires two steps: (1) traverse the secondary index to find the pointer/key, then (2) go to the clustered index (or heap) to fetch the full row. This second step — a key lookup or bookmark lookup — is where a covering index (see next question) helps by avoiding it entirely.

-- Needs a key lookup: ix_orders_customer_id doesn't contain 'total'
SELECT total FROM orders WHERE customer_id = 42;

Choose the clustered index key (often the primary key) based on the most common range-scan/sequential access pattern for that table — e.g., an auto-incrementing id or a time-ordered column, since sequential inserts into a clustered index avoid the page-splitting overhead that inserting into the middle of a random-order clustered key causes. Add non-clustered indexes for other frequently-filtered/joined columns.