What is Feature Scaling in Machine Learning?
Feature scaling is a technique used to bring all features in your dataset onto a similar scale, especially when features have different units (e.g., age in years, salary in dollars).
Real-life Analogy
Imagine you're comparing two car features to predict fuel efficiency:
- Engine Size (in CC): ranges from 1000 to 5000
- Number of Doors: ranges from 2 to 5
Why do we need Feature Scaling?
Many machine learning algorithms calculate distances (e.g., K-Nearest Neighbors, SVM, KMeans) or assume features are on a similar scale (e.g., Logistic Regression, Gradient Descent). Without scaling:
- Model training may be unstable
- Slow convergence (in gradient-based algorithms)
- Bias toward features with higher magnitude
🧠 Question:
Why not just leave the features as they are? After all, larger numbers don't always mean more importance, right?
Answer:
True, but algorithms cannot distinguish between value magnitude due to importance vs. due to units. Scaling neutralizes this issue by normalizing magnitudes, not relationships.
⚙️ Two Common Techniques: Normalization vs Standardization
1. Normalization (Min-Max Scaling)
Transforms values to a fixed range, usually [0, 1].
Formula:
X_scaled = (X - X_min) / (X_max - X_min)
Use case: When you know data distribution is not Gaussian and you want a bounded scale (e.g., Neural Networks often prefer 0–1 range).
Example:
Height (cm): [160, 170, 180] Min = 160, Max = 180 => Normalized values: (160-160)/(180-160) = 0 (170-160)/(180-160) = 0.5 (180-160)/(180-160) = 1
2. Standardization (Z-score Scaling)
Transforms data to have zero mean and unit variance.
Formula:
Z = (X - mean) / std_dev
Use case: Works well when the data follows a Gaussian distribution (bell curve). Recommended for algorithms like Logistic Regression, SVM, and PCA.
Example:
Scores: [50, 60, 70] Mean = 60, Std Dev = 10 => Z-scores: (50-60)/10 = -1 (60-60)/10 = 0 (70-60)/10 = +1
Python Example: Scaling Real Data
Let's apply both MinMaxScaler and StandardScaler on sample data using scikit-learn.
from sklearn.preprocessing import MinMaxScaler, StandardScaler
import pandas as pd
# Sample dataset
data = {
'Age': [25, 32, 47, 51, 62],
'Salary': [40000, 50000, 62000, 70000, 80000]
}
df = pd.DataFrame(data)
# 1. Normalization (MinMax Scaling)
minmax_scaler = MinMaxScaler()
df_minmax = pd.DataFrame(minmax_scaler.fit_transform(df), columns=df.columns)
# 2. Standardization (Z-score Scaling)
standard_scaler = StandardScaler()
df_standard = pd.DataFrame(standard_scaler.fit_transform(df), columns=df.columns)
print("Original Data:")
print(df)
print("\nMin-Max Scaled:")
print(df_minmax)
print("\nStandard Scaled:")
print(df_standard)
Output:
Original Data: Age Salary 0 25 40000 1 32 50000 2 47 62000 3 51 70000 4 62 80000 Min-Max Scaled: Age Salary 0 0.000000 0.000000 1 0.194444 0.222222 2 0.611111 0.488889 3 0.722222 0.666667 4 1.000000 1.000000 Standard Scaled: Age Salary 0 -1.411972 -1.414214 1 -0.806046 -0.707107 2 0.237581 0.000000 3 0.649384 0.565686 4 1.331053 1.555734
Code Explanation
- MinMaxScaler transforms all values between 0 and 1. You can clearly see Age and Salary being scaled proportionally.
- StandardScaler centers data around 0 with a standard deviation of 1. This is ideal when features follow a normal distribution.
- We used
fit_transform()
to compute statistics and apply scaling at once.
Which scaling method should I use?
- Use MinMaxScaler when data is not normally distributed or when using neural networks.
- Use StandardScaler when features are roughly Gaussian (e.g., Logistic Regression, SVM).
Summary
- Feature scaling is essential for ML models to perform well and converge faster.
- Use Min-Max or Z-score depending on data distribution and algorithm used.
- Always scale training and test data with the same scaler fitted on training data.
Quick Recap Quiz
- Why does feature scaling help KNN or SVM?
- What is the difference between normalization and standardization?
- What happens if you apply different scalers on training and test data?
📝 Answers:
- These models use distance metrics; scaling ensures fair contribution from all features.
- Normalization scales data to [0,1]; Standardization shifts it to zero mean, unit variance.
- Your model will see inconsistent patterns and likely perform poorly (data leakage).