Detecting NaNs in NumPy

Introduction: What Are NaNs in NumPy?

In the world of numerical computing, you’ll occasionally stumble upon a strange beast — NaN, short for Not a Number. These values often appear when operations fail or data is missing, and ignoring them can lead to unexpected results.

This tutorial will show you how to detect NaN values in NumPy arrays, step by step, with practical examples and checks to ensure your data stays clean and trustworthy.

Step 1: Why NaNs Matter

NaNs represent undefined or missing values. They might sneak into your dataset from:

  • Corrupted CSV files
  • Invalid mathematical operations (e.g., 0/0)
  • Manual data entry errors
  • APIs that return incomplete data

If left unchecked, NaNs can poison your calculations. Summing, averaging, or comparing arrays with NaNs can lead to misleading or silently broken logic.

Step 2: Detecting NaNs with np.isnan()

NumPy provides a direct way to find NaNs using np.isnan(). It returns a boolean array of the same shape, marking True wherever a NaN is present.

import numpy as np

arr = np.array([1.5, 2.0, np.nan, 4.5, np.nan])
nan_mask = np.isnan(arr)
print(nan_mask)
[False False  True False  True]

Step 3: Verifying with Conditional Count

You can verify the count of NaNs using np.sum() along with the mask:

nan_count = np.sum(nan_mask)
print(f"Total NaNs: {nan_count}")
Total NaNs: 2

This simple pattern — create a mask, sum the True values — is extremely powerful and scalable.

Step 4: Finding Indices of NaNs

To find where exactly the NaNs reside, use np.where():

nan_indices = np.where(nan_mask)
print("Indices with NaN:", nan_indices)
Indices with NaN: (array([2, 4]),)

Step 5: Using NaNs in Multidimensional Arrays

Detection works just as well with 2D or 3D arrays. Here's an example with a 2D matrix:

matrix = np.array([[1.0, 2.0], [np.nan, 3.5]])
print(np.isnan(matrix))
[[False False]
 [ True False]]

Step 6: Common Mistakes to Avoid

  • Never use == np.nan to check for NaNs. It always returns False due to how NaNs behave in floating-point logic.
  • Always prefer np.isnan() over manual loops — it's faster, more readable, and tested.
  • If you're chaining operations, apply NaN checks early to avoid polluted results down the pipeline.

Quick Recap

Let’s bring it all together:

  • Use np.isnan() to create a mask of NaNs
  • Use np.sum() to count them
  • Use np.where() to locate them

Next Steps

Now that you can confidently detect NaNs, the next logical step is to handle or replace them. Continue to the next tutorial where we'll explore np.nan_to_num(), np.isnan() with masking, and clean-up strategies that prepare your data for analysis.