What are window functions, and how do they differ from GROUP BY aggregation?
Quick Answer
A window function computes a value across a set of related rows (a "window") *without collapsing them into a single output row* — unlike `GROUP BY`, which reduces N rows to one row per group. Syntax uses `OVER (...)` after an aggregate or ranking function, optionally with `PARTITION BY` (grouping without collapsing) and `ORDER BY` (defining row order within the window, enabling running calculations).
Detailed Answer
The key distinction from GROUP BY
-- GROUP BY: collapses many rows into one row per department
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;
-- Output: one row PER department -- individual employee rows are gone.
-- Window function: keeps every employee row, adds a computed column
SELECT name, department, salary,
AVG(salary) OVER (PARTITION BY department) AS dept_avg_salary
FROM employees;
-- Output: one row PER EMPLOYEE, each annotated with their department's average.
This is the core value proposition: you can compare an individual row to an aggregate of its group (salary vs. dept_avg_salary) in the same row, something plain GROUP BY can't do without a self-join or subquery.
Anatomy of the OVER clause
function_name(...) OVER (
[PARTITION BY column1, column2, ...] -- groups rows, like GROUP BY, but doesn't collapse them
[ORDER BY column3, ...] -- defines row order within each partition
[ROWS/RANGE BETWEEN ... AND ...] -- defines the "frame" -- which rows within the partition to include
)
PARTITION BY— optional; without it, the whole result set is one partition. Divides rows into groups for the function to operate over, analogous toGROUP BYbut without reducing row count.ORDER BY(insideOVER) — defines the order used for ranking functions (ROW_NUMBER,RANK) and for frame-relative functions (LAG, running totals). This is independent of the query's outerORDER BY.- Frame clause (
ROWS BETWEEN ...) — defines exactly which rows within the partition the function sees relative to the current row (e.g., "from the start of the partition to the current row," for a running total).
Common window functions
ROW_NUMBER() OVER (...) -- sequential number, no ties
RANK() OVER (...) -- rank with gaps after ties
DENSE_RANK() OVER (...) -- rank without gaps after ties
LAG(col, n) OVER (...) -- value from n rows before the current row
LEAD(col, n) OVER (...) -- value from n rows after the current row
SUM(col) OVER (...) -- running/partitioned sum (not collapsed)
AVG(col) OVER (...) -- running/partitioned average
FIRST_VALUE(col) OVER (...) -- first value in the frame
Why this matters
Window functions are the standard, efficient way to express "rank within group," "running total," "percent of group total," or "compare to previous row" — all queries that, before window functions existed, required awkward self-joins or correlated subqueries (see the correlated subquery and self-join questions) that were both harder to read and often slower. They execute logically after WHERE/GROUP BY/HAVING but before ORDER BY/LIMIT in the query's execution order (see the execution order question), which is why you generally can't filter directly on a window function's result in the same query's WHERE clause without wrapping it in a subquery or CTE.