What is a correlated subquery, and how does it differ from a non-correlated one?

Detailed Answer

Non-correlated subquery

Self-contained — it doesn't reference anything from the outer query, so it can be evaluated exactly once:

SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

The inner SELECT AVG(salary) FROM employees computes one number, once, regardless of how many outer rows there are.

Correlated subquery

References a column from the outer query, so its result depends on which outer row is currently being evaluated:

SELECT e.name, e.salary, e.department
FROM employees e
WHERE e.salary > (
    SELECT AVG(salary)
    FROM employees e2
    WHERE e2.department = e.department   -- <-- correlated: references outer e.department
);

Here, the inner query's result (average salary) is different for every department, so conceptually it must be recomputed for each outer row.

Common use case: "latest/top row per group"

SELECT o.*
FROM orders o
WHERE o.order_date = (
    SELECT MAX(o2.order_date)
    FROM orders o2
    WHERE o2.customer_id = o.customer_id   -- correlated
);

This finds each customer's most recent order — a pattern that's hard to express without either a correlated subquery, a window function (ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC)), or a GROUP BY + join.

Performance implications

A naive mental model — "the subquery runs once per outer row" — describes correctness, not necessarily execution. Modern optimizers frequently rewrite correlated subqueries into semi-joins or anti-joins (especially ones using EXISTS/IN/NOT EXISTS), avoiding literal row-by-row re-execution. But not all correlated subqueries can be rewritten this way, and on an engine/query shape where the optimizer can't flatten it, you do get an actual nested-loop-style re-execution per outer row — which is O(n×m) and can be dramatically slower than an equivalent join or window function on large tables.

Practical guidance: correlated subqueries are fine and often the clearest way to express "compare each row to something computed from its own group." But when you hit a performance problem with one on a large table, check the execution plan first — a rewrite to a JOIN with GROUP BY, or a window function, is often available and can be an order of magnitude faster.

What is a correlated subquery, and how does it differ from a non-correlated one?

Quick Answer

Detailed Answer

Non-correlated subquery

Correlated subquery

Common use case: "latest/top row per group"

Performance implications

Related Resources

Related Questions