What are window functions, and how do they differ from GROUP BY aggregation?

Detailed Answer

The key distinction from GROUP BY

-- GROUP BY: collapses many rows into one row per department
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department;
-- Output: one row PER department -- individual employee rows are gone.

-- Window function: keeps every employee row, adds a computed column
SELECT name, department, salary,
       AVG(salary) OVER (PARTITION BY department) AS dept_avg_salary
FROM employees;
-- Output: one row PER EMPLOYEE, each annotated with their department's average.

This is the core value proposition: you can compare an individual row to an aggregate of its group (salary vs. dept_avg_salary) in the same row, something plain GROUP BY can't do without a self-join or subquery.

Anatomy of the OVER clause

function_name(...) OVER (
    [PARTITION BY column1, column2, ...]   -- groups rows, like GROUP BY, but doesn't collapse them
    [ORDER BY column3, ...]                -- defines row order within each partition
    [ROWS/RANGE BETWEEN ... AND ...]       -- defines the "frame" -- which rows within the partition to include
)

PARTITION BY — optional; without it, the whole result set is one partition. Divides rows into groups for the function to operate over, analogous to GROUP BY but without reducing row count.
ORDER BY (inside OVER) — defines the order used for ranking functions (ROW_NUMBER, RANK) and for frame-relative functions (LAG, running totals). This is independent of the query's outer ORDER BY.
Frame clause (ROWS BETWEEN ...) — defines exactly which rows within the partition the function sees relative to the current row (e.g., "from the start of the partition to the current row," for a running total).

Common window functions

ROW_NUMBER() OVER (...)         -- sequential number, no ties
RANK() OVER (...)               -- rank with gaps after ties
DENSE_RANK() OVER (...)         -- rank without gaps after ties
LAG(col, n) OVER (...)          -- value from n rows before the current row
LEAD(col, n) OVER (...)         -- value from n rows after the current row
SUM(col) OVER (...)             -- running/partitioned sum (not collapsed)
AVG(col) OVER (...)             -- running/partitioned average
FIRST_VALUE(col) OVER (...)     -- first value in the frame

Why this matters

Window functions are the standard, efficient way to express "rank within group," "running total," "percent of group total," or "compare to previous row" — all queries that, before window functions existed, required awkward self-joins or correlated subqueries (see the correlated subquery and self-join questions) that were both harder to read and often slower. They execute logically after WHERE/GROUP BY/HAVING but before ORDER BY/LIMIT in the query's execution order (see the execution order question), which is why you generally can't filter directly on a window function's result in the same query's WHERE clause without wrapping it in a subquery or CTE.

What are window functions, and how do they differ from GROUP BY aggregation?

Quick Answer

Detailed Answer

The key distinction from GROUP BY

Anatomy of the OVER clause

Common window functions

Why this matters

Related Resources

Related Questions