MongoDB - Getting StartedMongoDB - Getting Started1

MapReduce vs Aggregation Framework in MongoDB



MapReduce vs Aggregation Framework in MongoDB

MongoDB provides two powerful tools for processing and transforming data: MapReduce and the Aggregation Framework. Both are used to perform operations like grouping, filtering, and summarizing data. However, they differ in syntax, performance, and use cases.

What is MapReduce?

MapReduce is a data processing model borrowed from distributed systems like Hadoop. It consists of two main steps:

What is Aggregation Framework?

The Aggregation Framework uses a pipeline-based approach, where data passes through multiple stages such as $match, $group, $sort, and $project. It's much faster and easier to use compared to MapReduce in most scenarios.

Example Dataset

Let’s use a sample collection orders:


    db.orders.insertMany([
      { customer: "Alice", amount: 250 },
      { customer: "Bob", amount: 400 },
      { customer: "Alice", amount: 150 },
      { customer: "Bob", amount: 100 },
      { customer: "Charlie", amount: 300 }
    ])
    

Goal

We want to calculate the total amount spent by each customer.

Using Aggregation Framework


    db.orders.aggregate([
      {
        $group: {
          _id: "$customer",
          totalSpent: { $sum: "$amount" }
        }
      }
    ])
    

Output:

    [
      { "_id": "Alice", "totalSpent": 400 },
      { "_id": "Bob", "totalSpent": 500 },
      { "_id": "Charlie", "totalSpent": 300 }
    ]
    

Explanation: The $group stage groups documents by customer name and calculates the sum of their amount fields.

Using MapReduce


    var mapFunction = function() {
      emit(this.customer, this.amount);
    };

    var reduceFunction = function(key, values) {
      return Array.sum(values);
    };

    db.orders.mapReduce(
      mapFunction,
      reduceFunction,
      { out: "order_totals" }
    )
    

Output:


    db.order_totals.find().pretty()
    
    {
      "_id": "Alice",
      "value": 400
    }
    {
      "_id": "Bob",
      "value": 500
    }
    {
      "_id": "Charlie",
      "value": 300
    }
    

Explanation: The map function emits each customer and their order amount. The reduce function adds up all values for each customer. The result is stored in a new collection order_totals.

Intuition Check

Q: Which one should I use if I just want to sum values or filter records?

A: Use the Aggregation Framework. It’s faster, more concise, and optimized for simple tasks like grouping, filtering, and projecting data.

Performance Comparison

Use Cases for MapReduce

MapReduce can still be useful when:

Summary

What’s Next?

Now that you understand both approaches, the next lesson will dive deeper into building real-world aggregation pipelines using multiple stages like $match, $group, $project, and more.



Welcome to ProgramGuru

Sign up to start your journey with us

Support ProgramGuru.org

Mention your name, and programguru.org in the message. Your name shall be displayed in the sponsers list.

PayPal

UPI

PhonePe QR

MALLIKARJUNA M