What is a phantom read, and which isolation level prevents it?

5 minadvancedphantom-readisolation-levelstransactions

Quick Answer

A phantom read happens when a transaction re-runs the same range-based query twice and gets a different set of rows the second time, because another transaction inserted or deleted rows matching that range and committed in between — the "phantom" rows weren't there, then were. Per the SQL standard, only `Serializable` isolation is guaranteed to prevent phantom reads; some engines (notably PostgreSQL, via snapshot isolation) also prevent them at Repeatable Read, which is stricter than the standard technically requires.

Detailed Answer

A phantom read example

-- Transaction A                                -- Transaction B
BEGIN;
SELECT COUNT(*) FROM orders
  WHERE status = 'pending';  -- returns 5
                                                 BEGIN;
                                                 INSERT INTO orders (status, ...)
                                                   VALUES ('pending', ...);
                                                 COMMIT;
SELECT COUNT(*) FROM orders
  WHERE status = 'pending';  -- returns 6 !!
COMMIT;

Transaction A ran the identical query twice within the same transaction and got two different row counts, because Transaction B committed a new row matching the filter in between. The new row is the "phantom" — it wasn't part of the original result set, then suddenly was.

Why this differs from a non-repeatable read

A non-repeatable read is about a specific, already-fetched row changing value; a phantom read is about the set of rows matching a condition changing, specifically due to inserts or deletes (not updates to existing rows already in the result). This distinction is why some isolation levels can prevent one but not the other — locking a specific set of already-read rows (preventing non-repeatable reads) doesn't automatically prevent a new row from being inserted into that range.

Which levels prevent it

Per the ANSI SQL standard: only Serializable is required to fully prevent phantom reads — Repeatable Read is standard-permitted to still allow them, because traditional row-locking implementations lock the rows you've already read, not a "gap" covering rows that don't exist yet.

However, engine implementation varies: PostgreSQL implements Repeatable Read via snapshot isolation (MVCC), which takes a full snapshot at transaction start — since the "new" row in the example above wasn't part of that snapshot at all, PostgreSQL's Repeatable Read actually does prevent this specific phantom scenario, stricter than the standard's minimum requirement for that level. MySQL/InnoDB's Repeatable Read additionally uses gap locks and next-key locks specifically to prevent phantom inserts within a locked range for locking reads.

Don't assume "Repeatable Read" means the same guarantee across every database — verify your specific engine's actual behavior rather than relying on the SQL standard's minimum bar, since several major engines (PostgreSQL, MySQL/InnoDB) exceed it. If phantom reads matter to your business logic and you're not certain of your engine's exact behavior at Repeatable Read, Serializable is the only level the standard guarantees will prevent them everywhere.