EXISTS vs IN vs JOIN — when do you use each?

6 minintermediatejoinsexistsinsubquery

Quick Answer

`JOIN` combines and returns columns from both tables and is right when you need data from both sides. `IN` checks membership against a subquery's result list — simple, but historically weaker with NULLs and large lists in some optimizers. `EXISTS` checks only for the *presence* of at least one matching row and stops as soon as it finds one, making it well-suited (and NULL-safe) for filtering existence without needing any columns from the other table. Modern optimizers often produce the same plan for all three when used equivalently, but the NULL-safety and clarity differences still matter.

Detailed Answer

All three can express "give me rows in A related to rows in B," but they answer different underlying questions.

JOIN — when you need columns from both tables

SELECT c.name, o.total
FROM customers c
JOIN orders o ON o.customer_id = c.id;

If the goal is to display or aggregate data from both tables, JOIN is the natural tool — IN/EXISTS can't return columns from the subquery side.

IN — membership against a list

SELECT * FROM customers
WHERE id IN (SELECT customer_id FROM orders WHERE total > 1000);

Simple and readable for "customers who have at least one qualifying order," when you don't need any order columns in the output. Danger: if the subquery can return a NULL (e.g., customer_id is nullable and some order rows have it unset) and you're using NOT IN, the entire result silently becomes empty due to three-valued logic (see the NULL question) — this is one of the most common real-world SQL correctness bugs.

EXISTS — pure existence check, NULL-safe

SELECT * FROM customers c
WHERE EXISTS (
    SELECT 1 FROM orders o WHERE o.customer_id = c.id AND o.total > 1000
);

EXISTS only cares whether the subquery returns any row — the SELECT 1 is idiomatic because the actual selected value is irrelevant. It's NULL-safe by construction and, critically, the safe choice for negation:

-- Safe even if orders.customer_id can be NULL
SELECT * FROM customers c
WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id);

-- Unsafe: silently returns zero rows if any orders.customer_id is NULL
SELECT * FROM customers c
WHERE c.id NOT IN (SELECT customer_id FROM orders);

Duplication behavior

A JOIN can multiply output rows if a customer has multiple matching orders (one output row per match), which then requires DISTINCT or aggregation to get back to one row per customer. IN/EXISTS never duplicate the outer row regardless of how many subquery rows match, since they're boolean filters, not row-multiplying joins — this is often the deciding factor when you just need a yes/no filter, not order details.

  • Need columns from the related table → JOIN.
  • Need a plain existence/membership filter and the subquery result can't contain NULL, or you're doing a positive check → IN is fine and often reads well for short static lists.
  • Need a negative existence check (NOT ...), or the subquery column might contain NULL → always use NOT EXISTS, never NOT IN.