How do you model relationships in a document database — embedding vs. referencing?

7 minadvancednosqldocument-databasedata-modelingembeddingreferencing

Quick Answer

**Embedding** nests related data directly inside the parent document, ideal when the related data is always accessed together with the parent and doesn't need to be queried/updated independently. **Referencing** stores just an ID pointing to a document in another collection (similar to a foreign key), better when the related data is large, frequently changes independently, is shared across many parents, or is queried on its own. Most real schemas mix both, choosing per relationship based on access patterns.

Detailed Answer

Document databases have no native joins (or only limited, often less efficient support for them, like MongoDB's $lookup), so the modeling decision of embed-vs-reference is one of the most consequential design choices in a document schema.

Embedding — nest the related data directly

{
  "_id": "order_789",
  "customer_name": "Alice",
  "items": [
    { "product": "Widget", "qty": 2, "price": 9.99 },
    { "product": "Gadget", "qty": 1, "price": 19.99 }
  ]
}

Good fit: order line items — they're always fetched together with the order, rarely if ever queried independently of it, and there's a natural "belongs to exactly one parent" (order) relationship. One read fetches the complete, usable object with zero joins.

Referencing — store just an ID, like a foreign key

// customers collection
{ "_id": "cust_123", "name": "Alice", "email": "alice@example.com" }

// orders collection
{ "_id": "order_789", "customer_id": "cust_123", "items": [...] }

Good fit: the customer record — it's large, changes independently of any given order, is referenced by many orders (embedding it would duplicate the customer's full profile into every single order document, and any customer profile update would then need to fan out and update every order that embedded it).

Decision factors

Favor embedding when...Favor referencing when...
Data is always read together with the parentData is often queried/updated independently
One-to-few relationship (a handful of items)One-to-many-many or many-to-many (shared across many parents)
The child has no independent identity outside the parentThe child is a real standalone entity referenced from multiple places
Document size stays reasonableEmbedding would make documents unbounded/huge (e.g., embedding every comment ever made on a popular post)

The unbounded-growth trap

A common modeling mistake: embedding a collection that can grow indefinitely (e.g., embedding every comment directly inside a blog post document). Most document databases impose a maximum document size (MongoDB: 16MB), and even below that limit, an ever-growing embedded array makes the document progressively more expensive to read/write/reallocate as it grows — this is the classic sign a "child" actually needs its own collection with a reference back to the parent, rather than embedding.

Model per relationship based on actual access patterns, not a blanket rule — it's completely normal (and usually correct) for a single schema to embed some relationships and reference others, mirroring exactly the same judgment call a relational schema designer makes when deciding what to denormalize (see that question) versus keep fully normalized.