How do you model relationships in a document database — embedding vs. referencing? | SQL & Databases Interview Question

Detailed Answer

Document databases have no native joins (or only limited, often less efficient support for them, like MongoDB's $lookup), so the modeling decision of embed-vs-reference is one of the most consequential design choices in a document schema.

Embedding — nest the related data directly

{
  "_id": "order_789",
  "customer_name": "Alice",
  "items": [
    { "product": "Widget", "qty": 2, "price": 9.99 },
    { "product": "Gadget", "qty": 1, "price": 19.99 }
  ]
}

Good fit: order line items — they're always fetched together with the order, rarely if ever queried independently of it, and there's a natural "belongs to exactly one parent" (order) relationship. One read fetches the complete, usable object with zero joins.

Referencing — store just an ID, like a foreign key

// customers collection
{ "_id": "cust_123", "name": "Alice", "email": "alice@example.com" }

// orders collection
{ "_id": "order_789", "customer_id": "cust_123", "items": [...] }

Good fit: the customer record — it's large, changes independently of any given order, is referenced by many orders (embedding it would duplicate the customer's full profile into every single order document, and any customer profile update would then need to fan out and update every order that embedded it).

Decision factors

Favor embedding when...	Favor referencing when...
Data is always read together with the parent	Data is often queried/updated independently
One-to-few relationship (a handful of items)	One-to-many-many or many-to-many (shared across many parents)
The child has no independent identity outside the parent	The child is a real standalone entity referenced from multiple places
Document size stays reasonable	Embedding would make documents unbounded/huge (e.g., embedding every comment ever made on a popular post)

The unbounded-growth trap

A common modeling mistake: embedding a collection that can grow indefinitely (e.g., embedding every comment directly inside a blog post document). Most document databases impose a maximum document size (MongoDB: 16MB), and even below that limit, an ever-growing embedded array makes the document progressively more expensive to read/write/reallocate as it grows — this is the classic sign a "child" actually needs its own collection with a reference back to the parent, rather than embedding.

Model per relationship based on actual access patterns, not a blanket rule — it's completely normal (and usually correct) for a single schema to embed some relationships and reference others, mirroring exactly the same judgment call a relational schema designer makes when deciding what to denormalize (see that question) versus keep fully normalized.

How do you model relationships in a document database — embedding vs. referencing?

Quick Answer

Detailed Answer

Embedding — nest the related data directly

Referencing — store just an ID, like a foreign key

Decision factors

The unbounded-growth trap

Related Resources

Related Questions