Surrogate key vs natural key — what are the tradeoffs?
Quick Answer
A **natural key** is a column that has real-world business meaning (email, SSN, ISBN, order number). A **surrogate key** is an artificial, meaningless identifier generated purely for the database (auto-increment integer, UUID). Surrogate keys are immune to business-rule changes and are typically smaller/faster to index and join; natural keys are self-documenting and avoid an extra join to look up meaning, but they're at risk if the "unique, never-changes" assumption about them turns out to be wrong.
Detailed Answer
-- Surrogate key: meaningless, generated
CREATE TABLE products (
id SERIAL PRIMARY KEY, -- surrogate
sku VARCHAR(50) UNIQUE -- natural key, still enforced unique
);
-- Using a natural key directly as primary key
CREATE TABLE countries (
iso_code CHAR(2) PRIMARY KEY -- natural key: 'US', 'GB', 'DE'
);
Natural key
Pros:
- Self-documenting —
iso_code = 'US'is meaningful without a join. - No extra lookup needed if the business already has the value on hand.
Cons:
- Business "invariants" break more often than expected — social security numbers get reissued in rare cases, email addresses get reused after account deletion, product SKUs get renumbered during a re-branding. Once a natural key is used as a foreign key target in many other tables, correcting it later means cascading updates everywhere.
- Often wider/composite (e.g., a natural key might need multiple columns), which makes every foreign key referencing it wider too, increasing index size and join cost.
Surrogate key
Pros:
- Guaranteed stable — it's meaningless, so there's never a business reason to change it.
- Usually a single, small, fixed-width column (integer or UUID), which keeps foreign keys and their indexes compact and fast to join.
- Decouples the database's internal identity from business rules, which can (and do) change.
Cons:
- Meaningless in isolation — reading a raw
id = 8271tells you nothing without a lookup. - Still need a
UNIQUEconstraint on the natural key anyway if it must remain unique for business reasons (as in theskuexample above) — the surrogate key doesn't eliminate the need to validate real-world uniqueness, it just avoids using that value as the relational identifier.
Auto-increment integer vs UUID as a surrogate
- Auto-increment integer: compact (4–8 bytes), sequential, fast for
B-treeindex inserts (append-mostly), but reveals row count/creation order and doesn't work well for merging data generated across multiple independent systems (collisions). - UUID: globally unique without coordination (good for distributed/offline-generated IDs, merging data from multiple sources), but larger (16 bytes), and random UUIDs (v4) cause index fragmentation because inserts land at random points in a B-tree rather than appending at the end — UUIDv7 (time-ordered) mitigates this by keeping inserts roughly sequential while remaining globally unique.
Default to a surrogate primary key for internal relational integrity, and add a UNIQUE constraint on any natural-key column that truly must be unique for business reasons. This gives you the stability of a surrogate key without giving up validation of real-world uniqueness.