Schema Design Patterns: Embedding vs Referencing

MongoDB allows flexible schema design, but choosing the right approach is key to performance and maintainability. The two primary patterns are:

Embedding – Store related data within a single document
Referencing – Store related data in separate documents and reference them

What is Embedding?

Embedding means nesting one document inside another. It is useful when related data is mostly accessed together and doesn’t grow unbounded.

Example: Consider an e-commerce system where each order includes a list of items.


    db.orders.insertOne({
      _id: 1,
      customer: "John Doe",
      date: "2025-05-01",
      items: [
        { product: "Keyboard", quantity: 1, price: 1200 },
        { product: "Mouse", quantity: 2, price: 500 }
      ]
    });

    {
      acknowledged: true,
      insertedId: 1
    }

Explanation: The items are embedded inside the orders document. This structure allows fast reads since all order data is in one place.

Question: Is embedding suitable when items grow too large (e.g., thousands per order)?

Answer: No. Documents in MongoDB have a 16MB limit. If an embedded array grows too large, referencing is better.

What is Referencing?

Referencing separates data into multiple collections and links them using IDs. This is better for reusable or large datasets.

Example: Let’s say we have users and posts. Each post references the user who created it.


    db.users.insertOne({
      _id: ObjectId("644d1f95f0e2fc7a7f91a1c1"),
      name: "Alice"
    });

    db.posts.insertOne({
      title: "My first blog",
      content: "This is my post",
      authorId: ObjectId("644d1f95f0e2fc7a7f91a1c1")
    });

    {
      acknowledged: true,
      insertedId: ObjectId("...")
    }

Explanation: The posts collection stores only a reference to the users collection using authorId. When needed, we can perform a manual join in application code or aggregation.

When to Embed?

Data is read together frequently
Document size stays well under 16MB
Data is tightly coupled (e.g., items in an order)

When to Reference?

Data is reused across multiple documents (e.g., user profile)
Data grows large (e.g., product reviews)
Need to maintain separation of concerns (e.g., permission, ownership)

Real-World Example: Blog with Comments

Embedding Comments Inside Blog


    db.blogs.insertOne({
      title: "MongoDB Schema Design",
      body: "This is about embedding vs referencing",
      comments: [
        { user: "Tom", text: "Great post!" },
        { user: "Jane", text: "Very helpful." }
      ]
    });

    {
      acknowledged: true,
      insertedId: ObjectId("...")
    }

Pros: Easy to retrieve post with comments.
Cons: Can be problematic if there are thousands of comments (growth limit).

Referencing Comments in Another Collection


    db.comments.insertMany([
      { blogId: ObjectId("..."), user: "Tom", text: "Great post!" },
      { blogId: ObjectId("..."), user: "Jane", text: "Very helpful." }
    ]);

    {
      acknowledged: true,
      insertedIds: [ObjectId("..."), ObjectId("...")]
    }

Explanation: Comments are in a separate collection. You can fetch them using blogId reference when needed. It scales better and keeps your documents lightweight.

Summary

MongoDB gives you flexibility in how you structure your data. You should choose between embedding and referencing based on access patterns, data growth, and reusability. Embedding is fast and simple; referencing is powerful and scalable.

Next Up

We'll learn about implementing schema validation to ensure data consistency in MongoDB collections.

⬅ Previous TopicMapReduce vs Aggregation Framework in MongoDB

Next Topic ⮕One-to-One, One-to-Many, Many-to-Many Relationships in MongoDB

Schema Design Patterns: Embedding vs Referencing