SQL is split into sub-languages by what kind of change a statement makes, and that split matters in practice because it determines things like transaction behavior, required privileges, and whether an operation can be rolled back.
The four categories
| Category | Full name | Example statements | What it affects |
|---|---|---|---|
| DDL | Data Definition Language | CREATE TABLE, ALTER TABLE, DROP TABLE, TRUNCATE | Schema/structure (tables, indexes, constraints) |
| DML | Data Manipulation Language | SELECT, INSERT, UPDATE, DELETE | Row-level data |
| DCL | Data Control Language | GRANT, REVOKE | Permissions and access control |
| TCL | Transaction Control Language | COMMIT, ROLLBACK, SAVEPOINT, SET TRANSACTION | Transaction boundaries |
-- DDL: defines structure
CREATE TABLE accounts (
id SERIAL PRIMARY KEY,
balance NUMERIC(12,2) NOT NULL DEFAULT 0
);
-- DML: manipulates rows
INSERT INTO accounts (balance) VALUES (100.00);
UPDATE accounts SET balance = balance - 50 WHERE id = 1;
-- DCL: controls access
GRANT SELECT, INSERT ON accounts TO app_user;
-- TCL: controls the transaction
BEGIN;
UPDATE accounts SET balance = balance - 50 WHERE id = 1;
UPDATE accounts SET balance = balance + 50 WHERE id = 2;
COMMIT;
Why the distinction matters
Most production databases auto-commit DDL (or even implicitly commit any open transaction before running it) — running ALTER TABLE mid-transaction in MySQL, for example, causes an implicit commit, so you can't roll a schema change back the way you can an UPDATE. PostgreSQL is a notable exception: it supports transactional DDL, so a CREATE TABLE inside a BEGIN...ROLLBACK block really does disappear.
DCL statements are also typically not transactional in the same sense — permission changes often take effect immediately and aren't undone by ROLLBACK in many engines. Knowing which bucket a statement falls into tells you whether you can safely wrap it in a transaction for an atomic migration, or whether you need a different rollback strategy (e.g., a paired "down" migration script).
Related Resources
You write a SELECT statement top to bottom, but the database evaluates it in a different logical order. Understanding that order explains several rules that otherwise look arbitrary.
The logical order
1. FROM / JOIN -- build the base row set
2. WHERE -- filter individual rows
3. GROUP BY -- bucket remaining rows into groups
4. HAVING -- filter groups
5. SELECT -- compute output expressions (incl. window functions)
6. DISTINCT -- remove duplicate output rows
7. ORDER BY -- sort the result
8. LIMIT / OFFSET -- take a slice
Note this is the logical order — real optimizers reorder physical execution (e.g., pushing a WHERE predicate down before a join, or picking an index that satisfies ORDER BY for free) as long as the observable result is identical.
Why this explains common gotchas
You can't use a SELECT alias in WHERE:
-- Fails: "total" doesn't exist yet when WHERE is evaluated
SELECT price * quantity AS total FROM orders WHERE total > 100;
-- Works: recompute the expression, or move the filter to HAVING with a subquery/CTE
SELECT * FROM (
SELECT price * quantity AS total FROM orders
) t WHERE total > 100;
But you can use it in ORDER BY — by the time ORDER BY runs, SELECT has already executed and the alias exists:
SELECT price * quantity AS total FROM orders ORDER BY total DESC;
WHERE can't filter on aggregates, HAVING can:
-- Fails: WHERE runs before GROUP BY, so COUNT(*) doesn't exist yet
SELECT customer_id, COUNT(*) FROM orders WHERE COUNT(*) > 5 GROUP BY customer_id;
-- Correct: HAVING runs after grouping
SELECT customer_id, COUNT(*) FROM orders GROUP BY customer_id HAVING COUNT(*) > 5;
Window functions see SELECT-time data, not raw rows — they run after WHERE/GROUP BY/HAVING but before DISTINCT/ORDER BY/LIMIT, which is why you generally can't reference a window function's result directly in the same SELECT's WHERE clause (you need to wrap it in a subquery or CTE and filter in the outer query instead).
Related Resources
Both clauses filter rows, but at different stages of query processing (see the logical execution order: FROM → WHERE → GROUP BY → HAVING → SELECT).
WHERE: filters rows before grouping
SELECT department, AVG(salary) AS avg_salary
FROM employees
WHERE hire_date >= '2020-01-01' -- filters individual employee rows first
GROUP BY department;
HAVING: filters groups after aggregation
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING AVG(salary) > 75000; -- filters the resulting department groups
Combining both
SELECT department, COUNT(*) AS headcount
FROM employees
WHERE status = 'active' -- row-level filter first
GROUP BY department
HAVING COUNT(*) >= 10; -- group-level filter second
The rule of thumb
- Use
WHEREfor conditions on raw column values that exist before grouping. - Use
HAVINGfor conditions on the result of an aggregate function. - Trying to put an aggregate condition in
WHERE(WHERE COUNT(*) > 5) fails, becauseCOUNT(*)doesn't exist untilGROUP BYhas executed. - Filtering earlier with
WHEREis almost always more efficient than filtering later withHAVING, becauseWHEREreduces the row set before the (often expensive) grouping and aggregation work happens. Never useHAVINGfor a condition that could be expressed inWHERE.
Related Resources
NULL represents "unknown" or "missing," not a value like zero or an empty string — and that distinction drives SQL's three-valued logic.
The three truth values
Any predicate evaluates to TRUE, FALSE, or UNKNOWN. NULL compared to anything — including another NULL — produces UNKNOWN:
SELECT NULL = NULL; -- UNKNOWN (not TRUE!)
SELECT NULL <> NULL; -- UNKNOWN
SELECT 5 = NULL; -- UNKNOWN
SELECT 5 <> NULL; -- UNKNOWN
WHERE and HAVING only keep rows where the condition is TRUE — UNKNOWN is treated the same as FALSE for filtering purposes (but differently for NOT, see below).
Testing for NULL correctly
-- Wrong: always UNKNOWN, matches nothing
SELECT * FROM users WHERE middle_name = NULL;
-- Correct
SELECT * FROM users WHERE middle_name IS NULL;
SELECT * FROM users WHERE middle_name IS NOT NULL;
The classic NOT IN trap
This is the single most common NULL bug in production SQL:
-- If banned_ids contains even one NULL, this returns ZERO rows,
-- because "x NOT IN (1, 2, NULL)" expands to
-- "x <> 1 AND x <> 2 AND x <> NULL", and the last comparison is UNKNOWN,
-- which poisons the whole AND chain to UNKNOWN.
SELECT * FROM users WHERE id NOT IN (SELECT banned_id FROM bans);
Fix it with NOT EXISTS (which handles NULLs correctly) or by filtering NULLs out of the subquery explicitly:
SELECT * FROM users u
WHERE NOT EXISTS (SELECT 1 FROM bans b WHERE b.banned_id = u.id);
Useful NULL-handling functions
COALESCE(a, b, c) -- returns the first non-NULL argument
NULLIF(a, b) -- returns NULL if a = b, otherwise returns a
-- Avoid division by zero producing an error, return NULL instead
SELECT revenue / NULLIF(units_sold, 0) AS avg_price FROM sales;
-- Provide a default for a possibly-NULL column
SELECT COALESCE(nickname, first_name) AS display_name FROM users;
Also remember that aggregate functions like SUM, AVG, COUNT(column) all ignore NULLs silently — COUNT(*) counts rows, but COUNT(column) counts only non-NULL values in that column, which is a frequent source of off-by-some-amount bugs.
Related Resources
All three remove data or structure, but at very different granularities and costs.
DELETE | TRUNCATE | DROP | |
|---|---|---|---|
| Removes | Matching rows (or all rows) | All rows | The whole table object |
Can filter with WHERE | Yes | No | N/A |
| Speed | Slow for large tables (row-by-row logging) | Fast (deallocates pages) | Fast |
| Fires triggers | Yes | Usually no | No |
| Resets identity/auto-increment | No | Usually yes | N/A (table gone) |
| Transactional / rollback-able | Yes, fully | Depends on engine (PostgreSQL: yes; MySQL/InnoDB: implicit commit) | Depends on engine (same caveat) |
| Table structure afterward | Unchanged, empty or filtered | Unchanged, empty | Table no longer exists |
| Category | DML | DDL (in most engines) | DDL |
-- DELETE: row-level, filterable, fully logged
DELETE FROM orders WHERE status = 'cancelled';
-- TRUNCATE: removes everything, resets the table's storage
TRUNCATE TABLE orders;
-- DROP: the table itself is gone
DROP TABLE orders;
Why TRUNCATE is faster
DELETE scans and removes rows individually, writing an entry to the transaction/redo log per row (or per page) so it can be rolled back and so triggers can fire per row. TRUNCATE instead deallocates the data pages that back the table wholesale — it's closer to a DDL operation than a DML one, which is why many engines treat it as non-transactional or auto-committing.
- Need to remove a subset of rows, want triggers to fire, or need it fully rollback-able mid-transaction →
DELETE. - Need to empty a whole table fast (e.g., clearing a staging table between ETL runs) and don't need row-level rollback →
TRUNCATE. - Need to remove the table definition entirely, including its indexes and constraints →
DROP. - Be careful with
TRUNCATEon tables referenced by foreign keys — most engines refuse to truncate a table that's the target of an active foreign key from another table unless you cascade or disable the constraint first.