Aggregation Pipeline Stages Explained
In MongoDB, the aggregation pipeline is a powerful framework for processing and transforming documents. Think of it as a data processing pipeline where each stage transforms the data and passes the results to the next stage.
The stages are executed in order, and the final result is returned after all transformations.
What is an Aggregation Pipeline?
An aggregation pipeline consists of multiple stages. Each stage performs an operation on the input documents and passes the results to the next stage.
Basic Syntax:
db.collection.aggregate([
{ stage1 },
{ stage2 },
...
]);
Key Aggregation Stages
$match
– Filters documents (like WHERE in SQL)$group
– Groups documents and performs aggregations (like GROUP BY)$project
– Reshapes the document, adds/removes fields$sort
– Sorts the documents$limit
and$skip
– For pagination
Example Collection
Let’s use a sales
collection:
db.sales.insertMany([
{ item: "laptop", price: 800, quantity: 5, region: "North" },
{ item: "phone", price: 500, quantity: 10, region: "North" },
{ item: "tablet", price: 300, quantity: 8, region: "South" },
{ item: "laptop", price: 800, quantity: 3, region: "South" },
{ item: "phone", price: 500, quantity: 7, region: "East" }
]);
Output:
{ acknowledged: true, insertedIds: [...] }
$match – Filtering Documents
Use $match
to filter documents based on a condition.
db.sales.aggregate([
{ $match: { region: "North" } }
]);
Output:
{ item: "laptop", price: 800, quantity: 5, region: "North" } { item: "phone", price: 500, quantity: 10, region: "North" }
Explanation: This stage selects only documents from the "North" region.
$group – Aggregating Data
Use $group
to aggregate documents. You must use an _id
field to group by.
db.sales.aggregate([
{ $group: { _id: "$item", totalQty: { $sum: "$quantity" } } }
]);
Output:
{ _id: "phone", totalQty: 17 } { _id: "tablet", totalQty: 8 } { _id: "laptop", totalQty: 8 }
Explanation: This groups all sales by item and sums the quantity
.
Q&A to Build Intuition
Q: Can I group by multiple fields in MongoDB?
A: Yes. You can group by a compound key using a sub-document:
db.sales.aggregate([
{
$group: {
_id: { item: "$item", region: "$region" },
totalQty: { $sum: "$quantity" }
}
}
]);
Output:
{ _id: { item: "phone", region: "East" }, totalQty: 7 } { _id: { item: "tablet", region: "South" }, totalQty: 8 } { _id: { item: "laptop", region: "South" }, totalQty: 3 } ...
$project – Shaping the Output
Use $project
to select or rename fields in the output.
db.sales.aggregate([
{
$project: {
item: 1,
revenue: { $multiply: ["$price", "$quantity"] },
_id: 0
}
}
]);
Output:
{ item: "laptop", revenue: 4000 } { item: "phone", revenue: 5000 } { item: "tablet", revenue: 2400 } ...
Explanation: This calculates a new field revenue
and excludes _id
from the output.
$sort – Sorting Results
Use $sort
to order results by a field.
db.sales.aggregate([
{ $sort: { quantity: -1 } }
]);
Output:
{ item: "phone", quantity: 10, ... } { item: "tablet", quantity: 8, ... } ...
Explanation: Sorting in descending order (-1
) by quantity.
Combining Stages
You can combine multiple stages in a pipeline to process documents step-by-step:
db.sales.aggregate([
{ $match: { region: "North" } },
{ $project: { item: 1, total: { $multiply: ["$price", "$quantity"] }, _id: 0 } },
{ $sort: { total: -1 } }
]);
Output:
{ item: "phone", total: 5000 } { item: "laptop", total: 4000 }
Explanation: This pipeline filters by region, calculates total revenue, and sorts the results in descending order.
Summary
$match
filters documents$group
aggregates documents$project
reshapes documents$sort
orders documents
Aggregation pipelines are essential for performing powerful analytics directly within MongoDB.
Up next: we’ll explore real-life aggregation use cases and practice building pipelines with complex conditions and transformations.