How do you read and interpret a query execution plan (EXPLAIN / EXPLAIN ANALYZE)?

7 minadvancedindexingexplainquery-planperformance

Quick Answer

`EXPLAIN` shows the plan the optimizer *would* use — join order, access methods (scan vs. index), and estimated cost/row counts — without running the query. `EXPLAIN ANALYZE` actually executes the query and adds real timing and actual row counts alongside the estimates, which is essential for spotting cases where the optimizer's estimate was wrong (a major cause of bad plans). Read plans from the innermost/deepest operations outward, and look for scan types, large estimate-vs-actual row-count gaps, and expensive operations like sorts or nested loops over large row counts.

Detailed Answer

EXPLAIN ANALYZE
SELECT c.name, o.total
FROM customers c
JOIN orders o ON o.customer_id = c.id
WHERE c.region = 'EU'
ORDER BY o.total DESC
LIMIT 10;

Example PostgreSQL output (abridged):

Limit  (cost=1520.44..1520.46 rows=10) (actual time=45.2..45.3 rows=10 loops=1)
  ->  Sort  (cost=1520.44..1545.10 rows=9865) (actual time=45.2..45.2 rows=10 loops=1)
        Sort Key: o.total DESC
        ->  Hash Join  (cost=245.00..1290.32 rows=9865) (actual time=3.1..38.7 rows=9910 loops=1)
              Hash Cond: (o.customer_id = c.id)
              ->  Seq Scan on orders o  (cost=0.00..820.00 rows=50000) (actual rows=50000 loops=1)
              ->  Hash  (cost=200.00..200.00 rows=3600) (actual rows=3550 loops=1)
                    ->  Seq Scan on customers c  (cost=0.00..200.00 rows=3600)
                          Filter: (region = 'EU')

How to read it

  • Indentation = nesting. Innermost/deepest operations execute first; their output feeds the operation above them. Read from the bottom/innermost outward.
  • cost=startup..total — the optimizer's estimated cost (an arbitrary unit, not milliseconds) to produce the first row and all rows, respectively. Only meaningful for comparing plans against each other on the same engine/config, not as an absolute number.
  • actual time=... rows=... loops=... — only present with ANALYZE: real measured time, real row counts, and how many times this node executed (relevant inside a nested loop, where an inner node runs once per outer row).

What to look for

  1. Scan type on each tableSeq Scan (full table scan) on a large table you expected to hit an index is the first thing to investigate. Not always wrong (see the selectivity question — sometimes a scan genuinely is cheaper), but worth questioning.
  2. Estimated vs. actual row counts — a huge gap (e.g., estimated 10 rows, actual 50,000) means the optimizer's statistics are stale or a predicate is hard to estimate (like a correlated condition across columns), and it likely picked a suboptimal plan as a result. This is one of the most valuable things ANALYZE gives you that plain EXPLAIN can't.
  3. Nested loops with a high loops count on an expensive inner operation — if the inner side of a nested loop isn't using an index and it runs once per outer row, cost multiplies fast.
  4. Sorts on large row counts — an expensive Sort node (as in the example above) sometimes disappears entirely if an index already provides the required order, avoiding the sort altogether.
  5. The overall top-level total cost/time relative to what you expect for the query's importance — but always validate against the actual time under ANALYZE, not just the estimate.

Engine differences

Every major engine has some form of this: PostgreSQL/MySQL use EXPLAIN/EXPLAIN ANALYZE; SQL Server has both a text/XML plan and the graphical "Actual Execution Plan" in SSMS; Oracle has EXPLAIN PLAN FOR plus DBMS_XPLAN. The concepts (scan types, join algorithms, cost estimates, actual vs. estimated rows) transfer across engines even though the exact syntax and terminology differ.

Caution: EXPLAIN ANALYZE actually executes the query, including any INSERT/UPDATE/DELETE — never run it carelessly against a write statement in production without wrapping it in a transaction you intend to roll back, unless you specifically want the write to happen.