EXISTS vs IN vs JOIN — when do you use each?
Quick Answer
`JOIN` combines and returns columns from both tables and is right when you need data from both sides. `IN` checks membership against a subquery's result list — simple, but historically weaker with NULLs and large lists in some optimizers. `EXISTS` checks only for the *presence* of at least one matching row and stops as soon as it finds one, making it well-suited (and NULL-safe) for filtering existence without needing any columns from the other table. Modern optimizers often produce the same plan for all three when used equivalently, but the NULL-safety and clarity differences still matter.
Detailed Answer
All three can express "give me rows in A related to rows in B," but they answer different underlying questions.
JOIN — when you need columns from both tables
SELECT c.name, o.total
FROM customers c
JOIN orders o ON o.customer_id = c.id;
If the goal is to display or aggregate data from both tables, JOIN is the natural tool — IN/EXISTS can't return columns from the subquery side.
IN — membership against a list
SELECT * FROM customers
WHERE id IN (SELECT customer_id FROM orders WHERE total > 1000);
Simple and readable for "customers who have at least one qualifying order," when you don't need any order columns in the output. Danger: if the subquery can return a NULL (e.g., customer_id is nullable and some order rows have it unset) and you're using NOT IN, the entire result silently becomes empty due to three-valued logic (see the NULL question) — this is one of the most common real-world SQL correctness bugs.
EXISTS — pure existence check, NULL-safe
SELECT * FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o WHERE o.customer_id = c.id AND o.total > 1000
);
EXISTS only cares whether the subquery returns any row — the SELECT 1 is idiomatic because the actual selected value is irrelevant. It's NULL-safe by construction and, critically, the safe choice for negation:
-- Safe even if orders.customer_id can be NULL
SELECT * FROM customers c
WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);
-- Unsafe: silently returns zero rows if any orders.customer_id is NULL
SELECT * FROM customers c
WHERE c.id NOT IN (SELECT customer_id FROM orders);
Duplication behavior
A JOIN can multiply output rows if a customer has multiple matching orders (one output row per match), which then requires DISTINCT or aggregation to get back to one row per customer. IN/EXISTS never duplicate the outer row regardless of how many subquery rows match, since they're boolean filters, not row-multiplying joins — this is often the deciding factor when you just need a yes/no filter, not order details.
- Need columns from the related table →
JOIN. - Need a plain existence/membership filter and the subquery result can't contain
NULL, or you're doing a positive check →INis fine and often reads well for short static lists. - Need a negative existence check (
NOT ...), or the subquery column might containNULL→ always useNOT EXISTS, neverNOT IN.