Surrogate key vs natural key — what are the tradeoffs? | SQL & Databases Interview Question

Detailed Answer

-- Surrogate key: meaningless, generated
CREATE TABLE products (
    id SERIAL PRIMARY KEY,        -- surrogate
    sku VARCHAR(50) UNIQUE        -- natural key, still enforced unique
);

-- Using a natural key directly as primary key
CREATE TABLE countries (
    iso_code CHAR(2) PRIMARY KEY  -- natural key: 'US', 'GB', 'DE'
);

Natural key

Pros:

Self-documenting — iso_code = 'US' is meaningful without a join.
No extra lookup needed if the business already has the value on hand.

Cons:

Business "invariants" break more often than expected — social security numbers get reissued in rare cases, email addresses get reused after account deletion, product SKUs get renumbered during a re-branding. Once a natural key is used as a foreign key target in many other tables, correcting it later means cascading updates everywhere.
Often wider/composite (e.g., a natural key might need multiple columns), which makes every foreign key referencing it wider too, increasing index size and join cost.

Surrogate key

Pros:

Guaranteed stable — it's meaningless, so there's never a business reason to change it.
Usually a single, small, fixed-width column (integer or UUID), which keeps foreign keys and their indexes compact and fast to join.
Decouples the database's internal identity from business rules, which can (and do) change.

Cons:

Meaningless in isolation — reading a raw id = 8271 tells you nothing without a lookup.
Still need a UNIQUE constraint on the natural key anyway if it must remain unique for business reasons (as in the sku example above) — the surrogate key doesn't eliminate the need to validate real-world uniqueness, it just avoids using that value as the relational identifier.

Auto-increment integer vs UUID as a surrogate

Auto-increment integer: compact (4–8 bytes), sequential, fast for B-tree index inserts (append-mostly), but reveals row count/creation order and doesn't work well for merging data generated across multiple independent systems (collisions).
UUID: globally unique without coordination (good for distributed/offline-generated IDs, merging data from multiple sources), but larger (16 bytes), and random UUIDs (v4) cause index fragmentation because inserts land at random points in a B-tree rather than appending at the end — UUIDv7 (time-ordered) mitigates this by keeping inserts roughly sequential while remaining globally unique.

Default to a surrogate primary key for internal relational integrity, and add a UNIQUE constraint on any natural-key column that truly must be unique for business reasons. This gives you the stability of a surrogate key without giving up validation of real-world uniqueness.

Surrogate key vs natural key — what are the tradeoffs?

Quick Answer

Detailed Answer

Natural key

Surrogate key

Auto-increment integer vs UUID as a surrogate

Related Resources

Related Questions