What is a correlated subquery, and how does it differ from a non-correlated one?
Quick Answer
A **non-correlated subquery** runs once, independently, and its result is reused for every row of the outer query. A **correlated subquery** references a column from the outer query, so it conceptually re-executes once per outer row (though optimizers often rewrite it into a join to avoid literally running it row by row). Correlated subqueries are essential for per-row comparisons ("find the latest order for each customer") but can be a performance trap if the optimizer can't rewrite them.
Detailed Answer
Non-correlated subquery
Self-contained — it doesn't reference anything from the outer query, so it can be evaluated exactly once:
SELECT name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
The inner SELECT AVG(salary) FROM employees computes one number, once, regardless of how many outer rows there are.
Correlated subquery
References a column from the outer query, so its result depends on which outer row is currently being evaluated:
SELECT e.name, e.salary, e.department
FROM employees e
WHERE e.salary > (
SELECT AVG(salary)
FROM employees e2
WHERE e2.department = e.department -- <-- correlated: references outer e.department
);
Here, the inner query's result (average salary) is different for every department, so conceptually it must be recomputed for each outer row.
Common use case: "latest/top row per group"
SELECT o.*
FROM orders o
WHERE o.order_date = (
SELECT MAX(o2.order_date)
FROM orders o2
WHERE o2.customer_id = o.customer_id -- correlated
);
This finds each customer's most recent order — a pattern that's hard to express without either a correlated subquery, a window function (ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC)), or a GROUP BY + join.
Performance implications
A naive mental model — "the subquery runs once per outer row" — describes correctness, not necessarily execution. Modern optimizers frequently rewrite correlated subqueries into semi-joins or anti-joins (especially ones using EXISTS/IN/NOT EXISTS), avoiding literal row-by-row re-execution. But not all correlated subqueries can be rewritten this way, and on an engine/query shape where the optimizer can't flatten it, you do get an actual nested-loop-style re-execution per outer row — which is O(n×m) and can be dramatically slower than an equivalent join or window function on large tables.
Practical guidance: correlated subqueries are fine and often the clearest way to express "compare each row to something computed from its own group." But when you hit a performance problem with one on a large table, check the execution plan first — a rewrite to a JOIN with GROUP BY, or a window function, is often available and can be an order of magnitude faster.