Introduction to Aggregation
Aggregation in MongoDB is a way of processing a large number of documents and transforming them into meaningful summarized results. It is equivalent to the GROUP BY
operations in SQL, but with far more flexibility and power.
The Aggregation Framework uses a concept called the aggregation pipeline, which consists of multiple stages. Each stage performs an operation on the input documents and passes the result to the next stage.
What is an Aggregation Pipeline?
An aggregation pipeline is an array of stages. Each stage transforms the data as it passes through.
Common stages include:
$match
– Filters documents (likefind()
)$group
– Groups documents and performs aggregations like sum, avg$project
– Reshapes documents (e.g., show or hide fields)$sort
– Sorts the documents$limit
– Limits the number of documents
Why use Aggregation?
Aggregation is used to:
- Compute totals, averages, and counts
- Group documents by a field
- Transform documents into new shapes
Example Dataset
Let’s say we have a sales
collection with the following documents:
db.sales.insertMany([
{ item: "Laptop", price: 70000, quantity: 2, region: "North" },
{ item: "Monitor", price: 12000, quantity: 5, region: "South" },
{ item: "Laptop", price: 70000, quantity: 1, region: "South" },
{ item: "Mouse", price: 500, quantity: 10, region: "North" },
{ item: "Keyboard", price: 1500, quantity: 3, region: "West" }
])
Example 1: Total Sales Amount per Item
We want to group documents by item
and calculate the total sales amount (price × quantity).
db.sales.aggregate([
{
$project: {
item: 1,
totalSale: { $multiply: ["$price", "$quantity"] }
}
},
{
$group: {
_id: "$item",
totalRevenue: { $sum: "$totalSale" }
}
}
])
Output:
{ _id: "Laptop", totalRevenue: 210000 } { _id: "Monitor", totalRevenue: 60000 } { _id: "Mouse", totalRevenue: 5000 } { _id: "Keyboard", totalRevenue: 4500 }
Explanation:
$project
stage creates a new fieldtotalSale
by multiplying price and quantity.$group
then aggregates these totalSale values grouped by item.
Intuition Check
Q: Why didn’t we calculate price * quantity
directly inside $group
?
A: Because $group
cannot access multiple fields for direct arithmetic. We first need to use $project
to calculate it, then use $group
to aggregate the result.
Example 2: Total Quantity Sold by Region
db.sales.aggregate([
{
$group: {
_id: "$region",
totalUnits: { $sum: "$quantity" }
}
}
])
Output:
{ _id: "North", totalUnits: 12 } { _id: "South", totalUnits: 6 } { _id: "West", totalUnits: 3 }
Explanation: We directly group by the region
field and sum the quantity
field to get total units sold per region.
Example 3: Show Only Item and Region
Let’s say we want to only view item
and region
, hiding everything else:
db.sales.aggregate([
{
$project: {
_id: 0,
item: 1,
region: 1
}
}
])
Output:
{ item: "Laptop", region: "North" } { item: "Monitor", region: "South" } { item: "Laptop", region: "South" } { item: "Mouse", region: "North" } { item: "Keyboard", region: "West" }
Explanation: The $project
stage is used to control which fields are shown. Setting _id: 0
hides the default _id
field.
Key Takeaways
- Aggregation in MongoDB is done through pipelines made of stages.
$project
is used to compute fields or reshape documents.$group
is used to aggregate data like sum, avg, count, etc.
Next Step
In the next lesson, we'll dive deeper into each aggregation stage and build more advanced pipelines using $match
, $sort
, and $limit
.