Schema Design Patterns: Embedding vs Referencing
MongoDB allows flexible schema design, but choosing the right approach is key to performance and maintainability. The two primary patterns are:
- Embedding – Store related data within a single document
- Referencing – Store related data in separate documents and reference them
What is Embedding?
Embedding means nesting one document inside another. It is useful when related data is mostly accessed together and doesn’t grow unbounded.
Example: Consider an e-commerce system where each order
includes a list of items
.
db.orders.insertOne({
_id: 1,
customer: "John Doe",
date: "2025-05-01",
items: [
{ product: "Keyboard", quantity: 1, price: 1200 },
{ product: "Mouse", quantity: 2, price: 500 }
]
});
Output:
{ acknowledged: true, insertedId: 1 }
Explanation: The items
are embedded inside the orders
document. This structure allows fast reads since all order data is in one place.
Question: Is embedding suitable when items grow too large (e.g., thousands per order)?
Answer: No. Documents in MongoDB have a 16MB limit. If an embedded array grows too large, referencing is better.
What is Referencing?
Referencing separates data into multiple collections and links them using IDs. This is better for reusable or large datasets.
Example: Let’s say we have users
and posts
. Each post references the user who created it.
db.users.insertOne({
_id: ObjectId("644d1f95f0e2fc7a7f91a1c1"),
name: "Alice"
});
db.posts.insertOne({
title: "My first blog",
content: "This is my post",
authorId: ObjectId("644d1f95f0e2fc7a7f91a1c1")
});
Output:
{ acknowledged: true, insertedId: ObjectId("...") }
Explanation: The posts
collection stores only a reference to the users
collection using authorId
. When needed, we can perform a manual join in application code or aggregation.
When to Embed?
- Data is read together frequently
- Document size stays well under 16MB
- Data is tightly coupled (e.g., items in an order)
When to Reference?
- Data is reused across multiple documents (e.g., user profile)
- Data grows large (e.g., product reviews)
- Need to maintain separation of concerns (e.g., permission, ownership)
Real-World Example: Blog with Comments
Embedding Comments Inside Blog
db.blogs.insertOne({
title: "MongoDB Schema Design",
body: "This is about embedding vs referencing",
comments: [
{ user: "Tom", text: "Great post!" },
{ user: "Jane", text: "Very helpful." }
]
});
Output:
{ acknowledged: true, insertedId: ObjectId("...") }
Pros: Easy to retrieve post with comments.
Cons: Can be problematic if there are thousands of comments (growth limit).
Referencing Comments in Another Collection
db.comments.insertMany([
{ blogId: ObjectId("..."), user: "Tom", text: "Great post!" },
{ blogId: ObjectId("..."), user: "Jane", text: "Very helpful." }
]);
Output:
{ acknowledged: true, insertedIds: [ObjectId("..."), ObjectId("...")] }
Explanation: Comments are in a separate collection. You can fetch them using blogId
reference when needed. It scales better and keeps your documents lightweight.
Summary
MongoDB gives you flexibility in how you structure your data. You should choose between embedding and referencing based on access patterns, data growth, and reusability. Embedding is fast and simple; referencing is powerful and scalable.
Next Up
We'll learn about implementing schema validation to ensure data consistency in MongoDB collections.