Introduction to Aggregation

Aggregation in MongoDB is a way of processing a large number of documents and transforming them into meaningful summarized results. It is equivalent to the GROUP BY operations in SQL, but with far more flexibility and power.

The Aggregation Framework uses a concept called the aggregation pipeline, which consists of multiple stages. Each stage performs an operation on the input documents and passes the result to the next stage.

What is an Aggregation Pipeline?

An aggregation pipeline is an array of stages. Each stage transforms the data as it passes through.

Common stages include:

$match – Filters documents (like find())
$group – Groups documents and performs aggregations like sum, avg
$project – Reshapes documents (e.g., show or hide fields)
$sort – Sorts the documents
$limit – Limits the number of documents

Why use Aggregation?

Aggregation is used to:

Compute totals, averages, and counts
Group documents by a field
Transform documents into new shapes

Example Dataset

Let’s say we have a sales collection with the following documents:


    db.sales.insertMany([
      { item: "Laptop", price: 70000, quantity: 2, region: "North" },
      { item: "Monitor", price: 12000, quantity: 5, region: "South" },
      { item: "Laptop", price: 70000, quantity: 1, region: "South" },
      { item: "Mouse", price: 500, quantity: 10, region: "North" },
      { item: "Keyboard", price: 1500, quantity: 3, region: "West" }
    ])

Example 1: Total Sales Amount per Item

We want to group documents by item and calculate the total sales amount (price × quantity).


    db.sales.aggregate([
      {
        $project: {
          item: 1,
          totalSale: { $multiply: ["$price", "$quantity"] }
        }
      },
      {
        $group: {
          _id: "$item",
          totalRevenue: { $sum: "$totalSale" }
        }
      }
    ])

    { _id: "Laptop", totalRevenue: 210000 }
    { _id: "Monitor", totalRevenue: 60000 }
    { _id: "Mouse", totalRevenue: 5000 }
    { _id: "Keyboard", totalRevenue: 4500 }

Explanation:

$project stage creates a new field totalSale by multiplying price and quantity.
$group then aggregates these totalSale values grouped by item.

Intuition Check

Q: Why didn’t we calculate price * quantity directly inside $group?

A: Because $group cannot access multiple fields for direct arithmetic. We first need to use $project to calculate it, then use $group to aggregate the result.

Example 2: Total Quantity Sold by Region


    db.sales.aggregate([
      {
        $group: {
          _id: "$region",
          totalUnits: { $sum: "$quantity" }
        }
      }
    ])

    { _id: "North", totalUnits: 12 }
    { _id: "South", totalUnits: 6 }
    { _id: "West", totalUnits: 3 }

Explanation: We directly group by the region field and sum the quantity field to get total units sold per region.

Example 3: Show Only Item and Region

Let’s say we want to only view item and region, hiding everything else:


    db.sales.aggregate([
      {
        $project: {
          _id: 0,
          item: 1,
          region: 1
        }
      }
    ])

    { item: "Laptop", region: "North" }
    { item: "Monitor", region: "South" }
    { item: "Laptop", region: "South" }
    { item: "Mouse", region: "North" }
    { item: "Keyboard", region: "West" }

Explanation: The $project stage is used to control which fields are shown. Setting _id: 0 hides the default _id field.

Key Takeaways

Aggregation in MongoDB is done through pipelines made of stages.
$project is used to compute fields or reshape documents.
$group is used to aggregate data like sum, avg, count, etc.

Next Step

In the next lesson, we'll dive deeper into each aggregation stage and build more advanced pipelines using $match, $sort, and $limit.

⬅ Previous TopicUsing explain() for Query Optimization in MongoDB

Next Topic ⮕Using $match, $group, $project, $sort in MongoDB Aggregation

Comments

Loading comments...

Introduction to Aggregation