Yandex

Machine Learning for BeginnersMachine Learning for Beginners1

Machine Learning - Feature Scaling and Normalization



What is Feature Scaling in Machine Learning?

Feature scaling is a technique used to bring all features in your dataset onto a similar scale, especially when features have different units (e.g., age in years, salary in dollars).

Real-life Analogy

Imagine you're comparing two car features to predict fuel efficiency:

  • Engine Size (in CC): ranges from 1000 to 5000
  • Number of Doors: ranges from 2 to 5
Without scaling, models like KNN or Gradient Descent may consider Engine Size more important just because it has larger values — even if it's not actually more impactful.

Why do we need Feature Scaling?

Many machine learning algorithms calculate distances (e.g., K-Nearest Neighbors, SVM, KMeans) or assume features are on a similar scale (e.g., Logistic Regression, Gradient Descent). Without scaling:

  • Model training may be unstable
  • Slow convergence (in gradient-based algorithms)
  • Bias toward features with higher magnitude

Question:

Why not just leave the features as they are? After all, larger numbers don't always mean more importance, right?

Answer:

True, but algorithms cannot distinguish between value magnitude due to importance vs. due to units. Scaling neutralizes this issue by normalizing magnitudes, not relationships.


⚙️ Two Common Techniques: Normalization vs Standardization

1. Normalization (Min-Max Scaling)

Transforms values to a fixed range, usually [0, 1].
Formula: X_scaled = (X - X_min) / (X_max - X_min)

Use case: When you know data distribution is not Gaussian and you want a bounded scale (e.g., Neural Networks often prefer 0–1 range).

Example:

Height (cm): [160, 170, 180]
Min = 160, Max = 180

=> Normalized values:
(160-160)/(180-160) = 0
(170-160)/(180-160) = 0.5
(180-160)/(180-160) = 1

2. Standardization (Z-score Scaling)

Transforms data to have zero mean and unit variance.
Formula: Z = (X - mean) / std_dev

Use case: Works well when the data follows a Gaussian distribution (bell curve). Recommended for algorithms like Logistic Regression, SVM, and PCA.

Example:

Scores: [50, 60, 70]
Mean = 60, Std Dev = 10

=> Z-scores:
(50-60)/10 = -1
(60-60)/10 =  0
(70-60)/10 = +1

Python Example: Scaling Real Data

Let's apply both MinMaxScaler and StandardScaler on sample data using scikit-learn.

from sklearn.preprocessing import MinMaxScaler, StandardScaler
import pandas as pd

# Sample dataset
data = {
    'Age': [25, 32, 47, 51, 62],
    'Salary': [40000, 50000, 62000, 70000, 80000]
}

df = pd.DataFrame(data)

# 1. Normalization (MinMax Scaling)
minmax_scaler = MinMaxScaler()
df_minmax = pd.DataFrame(minmax_scaler.fit_transform(df), columns=df.columns)

# 2. Standardization (Z-score Scaling)
standard_scaler = StandardScaler()
df_standard = pd.DataFrame(standard_scaler.fit_transform(df), columns=df.columns)

print("Original Data:")
print(df)
print("\nMin-Max Scaled:")
print(df_minmax)
print("\nStandard Scaled:")
print(df_standard)
Original Data:
   Age  Salary
0   25   40000
1   32   50000
2   47   62000
3   51   70000
4   62   80000

Min-Max Scaled:
        Age    Salary
0  0.000000  0.000000
1  0.194444  0.222222
2  0.611111  0.488889
3  0.722222  0.666667
4  1.000000  1.000000

Standard Scaled:
        Age    Salary
0 -1.411972 -1.414214
1 -0.806046 -0.707107
2  0.237581  0.000000
3  0.649384  0.565686
4  1.331053  1.555734

Code Explanation

  • MinMaxScaler transforms all values between 0 and 1. You can clearly see Age and Salary being scaled proportionally.
  • StandardScaler centers data around 0 with a standard deviation of 1. This is ideal when features follow a normal distribution.
  • We used fit_transform() to compute statistics and apply scaling at once.

Which scaling method should I use?

  • Use MinMaxScaler when data is not normally distributed or when using neural networks.
  • Use StandardScaler when features are roughly Gaussian (e.g., Logistic Regression, SVM).

Summary

  • Feature scaling is essential for ML models to perform well and converge faster.
  • Use Min-Max or Z-score depending on data distribution and algorithm used.
  • Always scale training and test data with the same scaler fitted on training data.

Quick Recap Quiz

  1. Why does feature scaling help KNN or SVM?
  2. What is the difference between normalization and standardization?
  3. What happens if you apply different scalers on training and test data?

Answers:

  1. These models use distance metrics; scaling ensures fair contribution from all features.
  2. Normalization scales data to [0,1]; Standardization shifts it to zero mean, unit variance.
  3. Your model will see inconsistent patterns and likely perform poorly (data leakage).


Welcome to ProgramGuru

Sign up to start your journey with us

Support ProgramGuru.org

You can support this website with a contribution of your choice.

When making a contribution, mention your name, and programguru.org in the message. Your name shall be displayed in the sponsors list.

PayPal

UPI

PhonePe QR

MALLIKARJUNA M