Machine Learning - Feature Scaling and Normalization

What is Feature Scaling in Machine Learning?

Feature scaling is a technique used to bring all features in your dataset onto a similar scale, especially when features have different units (e.g., age in years, salary in dollars).

Real-life Analogy

Imagine you're comparing two car features to predict fuel efficiency:

Engine Size (in CC): ranges from 1000 to 5000
Number of Doors: ranges from 2 to 5

Without scaling, models like KNN or Gradient Descent may consider Engine Size more important just because it has larger values — even if it's not actually more impactful.

Why do we need Feature Scaling?

Many machine learning algorithms calculate distances (e.g., K-Nearest Neighbors, SVM, KMeans) or assume features are on a similar scale (e.g., Logistic Regression, Gradient Descent). Without scaling:

Model training may be unstable
Slow convergence (in gradient-based algorithms)
Bias toward features with higher magnitude

Question:

Why not just leave the features as they are? After all, larger numbers don't always mean more importance, right?

Answer:

True, but algorithms cannot distinguish between value magnitude due to importance vs. due to units. Scaling neutralizes this issue by normalizing magnitudes, not relationships.

⚙️ Two Common Techniques: Normalization vs Standardization

1. Normalization (Min-Max Scaling)

Transforms values to a fixed range, usually [0, 1].
Formula: X_scaled = (X - X_min) / (X_max - X_min)

Use case: When you know data distribution is not Gaussian and you want a bounded scale (e.g., Neural Networks often prefer 0–1 range).

Example:

Height (cm): [160, 170, 180]
Min = 160, Max = 180

=> Normalized values:
(160-160)/(180-160) = 0
(170-160)/(180-160) = 0.5
(180-160)/(180-160) = 1

2. Standardization (Z-score Scaling)

Transforms data to have zero mean and unit variance.
Formula: Z = (X - mean) / std_dev

Use case: Works well when the data follows a Gaussian distribution (bell curve). Recommended for algorithms like Logistic Regression, SVM, and PCA.

Example:

Scores: [50, 60, 70]
Mean = 60, Std Dev = 10

=> Z-scores:
(50-60)/10 = -1
(60-60)/10 =  0
(70-60)/10 = +1

Python Example: Scaling Real Data

Let's apply both MinMaxScaler and StandardScaler on sample data using scikit-learn.

from sklearn.preprocessing import MinMaxScaler, StandardScaler
import pandas as pd

# Sample dataset
data = {
    'Age': [25, 32, 47, 51, 62],
    'Salary': [40000, 50000, 62000, 70000, 80000]
}

df = pd.DataFrame(data)

# 1. Normalization (MinMax Scaling)
minmax_scaler = MinMaxScaler()
df_minmax = pd.DataFrame(minmax_scaler.fit_transform(df), columns=df.columns)

# 2. Standardization (Z-score Scaling)
standard_scaler = StandardScaler()
df_standard = pd.DataFrame(standard_scaler.fit_transform(df), columns=df.columns)

print("Original Data:")
print(df)
print("\nMin-Max Scaled:")
print(df_minmax)
print("\nStandard Scaled:")
print(df_standard)

Original Data:
    Age  Salary
0   25   40000
1   32   50000
2   47   62000
3   51   70000
4   62   80000

Min-Max Scaled:
        Age    Salary
0  0.000000  0.000000
1  0.194444  0.222222
2  0.611111  0.488889
3  0.722222  0.666667
4  1.000000  1.000000

Standard Scaled:
        Age    Salary
0 -1.411972 -1.414214
1 -0.806046 -0.707107
2  0.237581  0.000000
3  0.649384  0.565686
4  1.331053  1.555734

Code Explanation

MinMaxScaler transforms all values between 0 and 1. You can clearly see Age and Salary being scaled proportionally.
StandardScaler centers data around 0 with a standard deviation of 1. This is ideal when features follow a normal distribution.
We used fit_transform() to compute statistics and apply scaling at once.

Which scaling method should I use?

Use MinMaxScaler when data is not normally distributed or when using neural networks.
Use StandardScaler when features are roughly Gaussian (e.g., Logistic Regression, SVM).

Summary

Feature scaling is essential for ML models to perform well and converge faster.
Use Min-Max or Z-score depending on data distribution and algorithm used.
Always scale training and test data with the same scaler fitted on training data.

Quick Recap Quiz

Why does feature scaling help KNN or SVM?
What is the difference between normalization and standardization?
What happens if you apply different scalers on training and test data?

Answers:

These models use distance metrics; scaling ensures fair contribution from all features.
Normalization scales data to [0,1]; Standardization shifts it to zero mean, unit variance.
Your model will see inconsistent patterns and likely perform poorly (data leakage).

⬅ Previous TopicEncoding Categorical Data in Machine Learning (Beginner-Friendly Guide)

Next Topic ⮕Train-Test Split in Machine Learning (With Examples and Python Code)

Comments

Loading comments...

Machine Learning - Feature Scaling and Normalization

What is Feature Scaling in Machine Learning?

Real-life Analogy

Why do we need Feature Scaling?

Question:

Answer:

⚙️ Two Common Techniques: Normalization vs Standardization

1. Normalization (Min-Max Scaling)

Example:

2. Standardization (Z-score Scaling)

Example:

Python Example: Scaling Real Data

Code Explanation

Which scaling method should I use?

Summary

Quick Recap Quiz

Answers:

Comments

Module 3: Data Preprocessing❯

Welcome to ProgramGuru

Player Settings