What is a join explosion (Cartesian-like row multiplication), and how do you avoid it?

Detailed Answer

The classic real-world bug: you join orders to order_items to compute a per-order total, but then also join to order_notes (also one-to-many) in the same query — and every order's line items get multiplied by its note count.

-- orders (1 row per order) x order_items (many rows per order)
--        x order_notes (many rows per order)
SELECT o.id, SUM(oi.price)
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
JOIN order_notes  n ON n.order_id  = o.id     -- BUG: multiplies rows
GROUP BY o.id;

If an order has 3 line items and 2 notes, this produces 3 × 2 = 6 rows for that order before the GROUP BY/SUM even runs — so SUM(oi.price) double- or triple-counts each item's price, inflating the total. This is far more dangerous than an obvious CROSS JOIN because the query looks correct and returns plausible-looking numbers.

Fixes

1. Aggregate each one-to-many side independently before joining:

SELECT o.id,
       items.total_price,
       notes.note_count
FROM orders o
LEFT JOIN (SELECT order_id, SUM(price) AS total_price FROM order_items GROUP BY order_id) items
    ON items.order_id = o.id
LEFT JOIN (SELECT order_id, COUNT(*) AS note_count FROM order_notes GROUP BY order_id) notes
    ON notes.order_id = o.id;

2. Use COUNT(DISTINCT ...) as a (partial) safety net — this fixes counting but not SUM, so it's not a general solution:

SELECT o.id, COUNT(DISTINCT oi.id) AS item_count, COUNT(DISTINCT n.id) AS note_count
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
JOIN order_notes  n ON n.order_id  = o.id
GROUP BY o.id;

3. Split into separate queries when the two one-to-many relationships genuinely don't need to be correlated in a single result row.

The general rule

Before joining, ask: "is this join guaranteed to match at most one row?" If both sides of every join in a query aren't 1:1 (or the "many" side is aggregated first), any SUM/COUNT/AVG computed after multiple one-to-many joins is suspect and should be double-checked against a simpler, join-free query on a small sample.

What is a join explosion (Cartesian-like row multiplication), and how do you avoid it?

Quick Answer

Detailed Answer

Fixes

The general rule

Related Resources

Related Questions