How do you read and interpret a query execution plan (EXPLAIN / EXPLAIN ANALYZE)?

Detailed Answer

EXPLAIN ANALYZE
SELECT c.name, o.total
FROM customers c
JOIN orders o ON o.customer_id = c.id
WHERE c.region = 'EU'
ORDER BY o.total DESC
LIMIT 10;

Example PostgreSQL output (abridged):

Limit  (cost=1520.44..1520.46 rows=10) (actual time=45.2..45.3 rows=10 loops=1)
  ->  Sort  (cost=1520.44..1545.10 rows=9865) (actual time=45.2..45.2 rows=10 loops=1)
        Sort Key: o.total DESC
        ->  Hash Join  (cost=245.00..1290.32 rows=9865) (actual time=3.1..38.7 rows=9910 loops=1)
              Hash Cond: (o.customer_id = c.id)
              ->  Seq Scan on orders o  (cost=0.00..820.00 rows=50000) (actual rows=50000 loops=1)
              ->  Hash  (cost=200.00..200.00 rows=3600) (actual rows=3550 loops=1)
                    ->  Seq Scan on customers c  (cost=0.00..200.00 rows=3600)
                          Filter: (region = 'EU')

How to read it

Indentation = nesting. Innermost/deepest operations execute first; their output feeds the operation above them. Read from the bottom/innermost outward.
cost=startup..total — the optimizer's estimated cost (an arbitrary unit, not milliseconds) to produce the first row and all rows, respectively. Only meaningful for comparing plans against each other on the same engine/config, not as an absolute number.
actual time=... rows=... loops=... — only present with ANALYZE: real measured time, real row counts, and how many times this node executed (relevant inside a nested loop, where an inner node runs once per outer row).

What to look for

Scan type on each table — Seq Scan (full table scan) on a large table you expected to hit an index is the first thing to investigate. Not always wrong (see the selectivity question — sometimes a scan genuinely is cheaper), but worth questioning.
Estimated vs. actual row counts — a huge gap (e.g., estimated 10 rows, actual 50,000) means the optimizer's statistics are stale or a predicate is hard to estimate (like a correlated condition across columns), and it likely picked a suboptimal plan as a result. This is one of the most valuable things ANALYZE gives you that plain EXPLAIN can't.
Nested loops with a high loops count on an expensive inner operation — if the inner side of a nested loop isn't using an index and it runs once per outer row, cost multiplies fast.
Sorts on large row counts — an expensive Sort node (as in the example above) sometimes disappears entirely if an index already provides the required order, avoiding the sort altogether.
The overall top-level total cost/time relative to what you expect for the query's importance — but always validate against the actual time under ANALYZE, not just the estimate.

Engine differences

Every major engine has some form of this: PostgreSQL/MySQL use EXPLAIN/EXPLAIN ANALYZE; SQL Server has both a text/XML plan and the graphical "Actual Execution Plan" in SSMS; Oracle has EXPLAIN PLAN FOR plus DBMS_XPLAN. The concepts (scan types, join algorithms, cost estimates, actual vs. estimated rows) transfer across engines even though the exact syntax and terminology differ.

Caution: EXPLAIN ANALYZE actually executes the query, including any INSERT/UPDATE/DELETE — never run it carelessly against a write statement in production without wrapping it in a transaction you intend to roll back, unless you specifically want the write to happen.

How do you read and interpret a query execution plan (EXPLAIN / EXPLAIN ANALYZE)?

Quick Answer

Detailed Answer

How to read it

What to look for

Engine differences

Related Resources

Related Questions