Introduction
NumPy and Pandas are like close teammates in the data analysis world. While NumPy offers high-speed numerical operations, Pandas provides rich tabular data structures. The real magic begins when you combine them effectively.
This tutorial walks you through using NumPy arrays with Pandas, showing how to convert, manipulate, and analyze data while leveraging the best of both libraries.
Step 1: Import Required Libraries
import numpy as np
import pandas as pd
Always start by importing NumPy and Pandas. Make sure you have both installed using pip install numpy pandas
if you're working in a new environment.
Step 2: Create a NumPy Array
data = np.array([[85, 90], [88, 92], [78, 80]])
This is a 2D array representing student scores in two subjects. NumPy provides fast operations on this structure, but for labeled analysis, we need Pandas.
Step 3: Convert NumPy Array to a DataFrame
df = pd.DataFrame(data, columns=["Math", "Science"])
We’re wrapping the NumPy array into a DataFrame with column names. This gives it context, readability, and functionality.
Step 4: Viewing the DataFrame
print(df)
Math Science
0 85 90
1 88 92
2 78 80
You now have tabular data that's easy to read and ready for further analysis.
Step 5: Adding a Row Using a NumPy Array
You might want to append a new student's scores.
new_row = np.array([[90, 95]])
df_new = pd.concat([df, pd.DataFrame(new_row, columns=df.columns)], ignore_index=True)
We wrapped the NumPy array again inside pd.DataFrame
and concatenated it with the original DataFrame.
Math Science
0 85 90
1 88 92
2 78 80
3 90 95
Step 6: Converting a Pandas Column to a NumPy Array
If you want to run NumPy operations on a specific column, you can extract it like this:
math_scores = df["Math"].to_numpy()
Now math_scores
is a NumPy array. You can calculate mean, standard deviation, or apply any mathematical function directly.
Step 7: Verify Data Types
Sometimes mismatches in data types cause bugs. Always check the underlying type:
print(type(df["Math"])) # <class 'pandas.core.series.Series'>
print(type(math_scores)) # <class 'numpy.ndarray'>
Step 8: Vectorized Operations with NumPy and Pandas
Pandas columns internally use NumPy arrays, so you can perform operations like this:
df["Math"] = df["Math"] + 5
This adds 5 to every Math score without any loops — clean, fast, and efficient.
Step 9: Safety Checks When Using NumPy Arrays
- Shape Mismatch: Make sure NumPy arrays match the shape of the DataFrame during assignments.
- Data Types: Be cautious when using mixed types in NumPy arrays — they’ll upcast to strings or objects.
- Index Alignment: Pandas aligns by index; if you're combining arrays directly, watch for misalignment.
Conclusion
Using NumPy arrays within Pandas allows you to combine raw computational power with labeled, structured data management. It's a symbiotic relationship that underpins much of modern data analysis.
Keep exploring — next, we'll dig into using SciPy functions to extend NumPy's math power into statistics and optimization.