What is Bias-Variance Tradeoff?
In machine learning, we aim to build models that generalize well — that is, perform well on new, unseen data. However, every model is a balancing act between two competing sources of error:
- Bias – Error due to overly simplistic assumptions in the model.
- Variance – Error due to too much complexity in the model, leading it to learn noise in the training data.
Bias-Variance Tradeoff is the balance we must strike between underfitting (high bias) and overfitting (high variance).
Example 1: Underfitting (High Bias)
Imagine trying to fit a straight line to data that actually forms a curve. The model is too simple to capture the real pattern.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
# Generating curved data
np.random.seed(1)
X = np.linspace(0, 10, 100).reshape(-1, 1)
y = np.sin(X).ravel() + np.random.normal(0, 0.1, X.shape[0])
# Fit a linear regression model
model = LinearRegression()
model.fit(X, y)
y_pred = model.predict(X)
# Plotting
plt.scatter(X, y, label="True Data", s=10)
plt.plot(X, y_pred, color='red', label="Linear Fit")
plt.title("Underfitting: High Bias")
plt.legend()
plt.show()
This plot will show a straight red line trying to approximate sinusoidal data — clearly a bad fit.
Explanation:
Linear regression can't capture the curve, so the predictions are far off. The model is underfitting — it has high bias and can't capture the underlying data pattern.
Why does this happen?
➤ The model is too simple for the data. It assumes a straight-line relationship, which doesn’t exist here.
What will happen if we try this model on test data?
➤ It will perform poorly both on training and test data, because it never learned the true pattern in the first place.
Example 2: Overfitting (High Variance)
Now let’s fit a model that’s too complex — like a polynomial of degree 15 — to the same data.
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
# High-degree polynomial regression (degree 15)
high_variance_model = make_pipeline(PolynomialFeatures(degree=15), LinearRegression())
high_variance_model.fit(X, y)
y_pred_poly = high_variance_model.predict(X)
# Plotting
plt.scatter(X, y, label="True Data", s=10)
plt.plot(X, y_pred_poly, color='green', label="Polynomial Degree 15")
plt.title("Overfitting: High Variance")
plt.legend()
plt.show()
This plot shows a wiggly green line that tries too hard to match every point — even the noise.
Explanation:
This model performs extremely well on training data but fails on test data. It memorizes the noise, which leads to poor generalization — this is high variance.
Why is overfitting dangerous?
➤ Because your model learns noise as if it were signal, which reduces accuracy on real-world data.
How can we detect overfitting?
➤ Huge gap between training and test accuracy. Training score is high, but test score is low.
Example 3: Just Right (Balanced Bias and Variance)
Let’s now try a moderately complex model — a polynomial of degree 3 — and see how it performs.
# Polynomial regression with degree 3
just_right_model = make_pipeline(PolynomialFeatures(degree=3), LinearRegression())
just_right_model.fit(X, y)
y_pred_just_right = just_right_model.predict(X)
# Plotting
plt.scatter(X, y, label="True Data", s=10)
plt.plot(X, y_pred_just_right, color='purple', label="Polynomial Degree 3")
plt.title("Balanced Fit")
plt.legend()
plt.show()
This plot shows a smooth purple curve that captures the main pattern of the data without overreacting to noise.
Explanation:
Polynomial degree 3 captures the true sinusoidal trend without going overboard. This is an example of a model with low bias and low variance — the sweet spot.
Bias-Variance Tradeoff Curve
Visually, the tradeoff looks like this:
- Bias decreases with increasing model complexity
- Variance increases with increasing model complexity
Too simple = high bias, too complex = high variance.
Summary Table
Type | Bias | Variance | Example |
---|---|---|---|
Underfitting | High | Low | Linear Regression on curved data |
Overfitting | Low | High | Polynomial (degree 15) regression |
Balanced | Low | Low | Polynomial (degree 3) regression |
Final Thought
Your goal is to minimize total error — which includes both bias and variance. This often requires:
- Trying different models
- Using validation scores
- Regularization (like Ridge/Lasso)
- Cross-validation
In short: Not too simple, not too complex — just right.