What is a join explosion (Cartesian-like row multiplication), and how do you avoid it?
Quick Answer
A join explosion happens when a join condition matches more rows than expected — most often because you're joining against a table where the "key" isn't actually unique per row of the other side — causing rows to multiply and aggregates (like `SUM`) to be inflated. It's avoided by joining on truly unique keys, aggregating one side *before* joining, or using `DISTINCT`/window functions to deduplicate after the fact.
Detailed Answer
The classic real-world bug: you join orders to order_items to compute a per-order total, but then also join to order_notes (also one-to-many) in the same query — and every order's line items get multiplied by its note count.
-- orders (1 row per order) x order_items (many rows per order)
-- x order_notes (many rows per order)
SELECT o.id, SUM(oi.price)
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
JOIN order_notes n ON n.order_id = o.id -- BUG: multiplies rows
GROUP BY o.id;
If an order has 3 line items and 2 notes, this produces 3 × 2 = 6 rows for that order before the GROUP BY/SUM even runs — so SUM(oi.price) double- or triple-counts each item's price, inflating the total. This is far more dangerous than an obvious CROSS JOIN because the query looks correct and returns plausible-looking numbers.
Fixes
1. Aggregate each one-to-many side independently before joining:
SELECT o.id,
items.total_price,
notes.note_count
FROM orders o
LEFT JOIN (SELECT order_id, SUM(price) AS total_price FROM order_items GROUP BY order_id) items
ON items.order_id = o.id
LEFT JOIN (SELECT order_id, COUNT(*) AS note_count FROM order_notes GROUP BY order_id) notes
ON notes.order_id = o.id;
2. Use COUNT(DISTINCT ...) as a (partial) safety net — this fixes counting but not SUM, so it's not a general solution:
SELECT o.id, COUNT(DISTINCT oi.id) AS item_count, COUNT(DISTINCT n.id) AS note_count
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
JOIN order_notes n ON n.order_id = o.id
GROUP BY o.id;
3. Split into separate queries when the two one-to-many relationships genuinely don't need to be correlated in a single result row.
The general rule
Before joining, ask: "is this join guaranteed to match at most one row?" If both sides of every join in a query aren't 1:1 (or the "many" side is aggregated first), any SUM/COUNT/AVG computed after multiple one-to-many joins is suspect and should be double-checked against a simpler, join-free query on a small sample.