What's the difference between DISTINCT and GROUP BY?

Detailed Answer

-- DISTINCT: unique combinations of the selected columns
SELECT DISTINCT department FROM employees;

-- GROUP BY: same result here, but built for aggregation
SELECT department FROM employees GROUP BY department;

-- GROUP BY's real purpose: per-group aggregates
SELECT department, COUNT(*), AVG(salary)
FROM employees
GROUP BY department;

Where they diverge

DISTINCT applies to every selected column together — it can't aggregate, and it can't return a value for one row that summarizes a group of others:

-- Returns one row per DISTINCT (department, job_title) combination
SELECT DISTINCT department, job_title FROM employees;

GROUP BY lets you select the grouping column(s) plus arbitrary aggregate expressions over each group — something DISTINCT simply cannot do:

SELECT department, job_title, COUNT(*) AS headcount, MAX(salary) AS top_salary
FROM employees
GROUP BY department, job_title;

Attempting to SELECT a non-aggregated, non-grouped column alongside GROUP BY is a functional-dependency violation that most engines reject (ONLY_FULL_GROUP_BY in MySQL) or, in older/lenient MySQL modes, silently returns an arbitrary value from the group — a frequent source of subtly wrong reports.

Performance

When there's no aggregate function involved, SELECT DISTINCT col FROM t and SELECT col FROM t GROUP BY col typically produce identical execution plans — both need some form of sort or hash-based deduplication, and query optimizers usually recognize the equivalence. Don't assume one is inherently faster than the other without checking EXPLAIN on your specific engine and data.

What's the difference between DISTINCT and GROUP BY?

Quick Answer

Detailed Answer

Where they diverge

Performance

Related Resources

Related Questions