Introduction to Aggregation in NumPy
Aggregation functions are the backbone of data analysis in NumPy. Whether you're summing up monthly sales, finding the maximum temperature from a dataset, or calculating the mean of sensor readings — aggregation functions provide the answers quickly and efficiently.
What Are Aggregation Functions?
Aggregation functions reduce an entire array to a single value or perform summary statistics along a specified axis. NumPy offers a wide variety of such functions, including:
np.sum()
– Total of all elementsnp.min()
andnp.max()
– Minimum and maximum valuesnp.mean()
– Average valuenp.std()
andnp.var()
– Standard deviation and variancenp.argmin()
andnp.argmax()
– Index of the minimum/maximum value
Creating a Sample Array
Let’s begin with a basic 2D NumPy array to explore these functions:
import numpy as np
arr = np.array([
[4, 7, 1],
[8, 5, 9]
])
print(arr)
[[4 7 1] [8 5 9]]
Using np.sum()
– Add Everything
This function adds up all the values in the array.
print(np.sum(arr))
34
The sum of all values: 4 + 7 + 1 + 8 + 5 + 9 = 34
Sum by Axis
print(np.sum(arr, axis=0)) # Column-wise sum
print(np.sum(arr, axis=1)) # Row-wise sum
[12 12 10] [12 22]
Explanation: For axis=0
, we are summing columns: [4+8, 7+5, 1+9]. For axis=1
, we’re summing rows: [4+7+1], [8+5+9].
Minimum and Maximum Values
To find the smallest and largest values:
print(np.min(arr))
print(np.max(arr))
1 9
Min/Max Along Axis
print(np.min(arr, axis=1)) # Min of each row
print(np.max(arr, axis=0)) # Max of each column
[1 5] [8 7 9]
Mean – The Average
To find the mean of all elements:
print(np.mean(arr))
5.666666666666667
The average of 6 numbers in the array: 34 / 6 = 5.666...
Standard Deviation and Variance
These help understand the spread of your data. Standard deviation shows how far values are from the mean.
print(np.std(arr))
print(np.var(arr))
2.6246692913372702 6.888888888888889
Finding Index of Extremes: argmin
and argmax
print(np.argmin(arr)) # Index in flattened array
print(np.argmax(arr))
2 5
Index 2 corresponds to the value 1
, and index 5 is 9
in the flattened array.
Things to Watch Out For
- If an array contains
np.nan
, aggregation results may also benan
. Usenp.nansum()
,np.nanmean()
, etc. to handle such cases. - When working with multidimensional arrays, always be explicit about the
axis
parameter to avoid confusion. - Operations return new values — original arrays remain unchanged unless explicitly reassigned.
Recap
Aggregation functions in NumPy give you quick, clear insights into your data. Whether you need a simple total or a detailed statistical spread, these tools are essential for any kind of numerical computing. Practice each function, play with the axes, and understand the shape and behavior of the results to become truly confident with array operations.