Machine Learning for BeginnersMachine Learning for Beginners1

🔍

Bias-Variance Tradeoff in Machine Learning (with Python Examples)



What is Bias-Variance Tradeoff?

In machine learning, we aim to build models that generalize well — that is, perform well on new, unseen data. However, every model is a balancing act between two competing sources of error:

Bias-Variance Tradeoff is the balance we must strike between underfitting (high bias) and overfitting (high variance).


Example 1: Underfitting (High Bias)

Imagine trying to fit a straight line to data that actually forms a curve. The model is too simple to capture the real pattern.


import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Generating curved data
np.random.seed(1)
X = np.linspace(0, 10, 100).reshape(-1, 1)
y = np.sin(X).ravel() + np.random.normal(0, 0.1, X.shape[0])

# Fit a linear regression model
model = LinearRegression()
model.fit(X, y)
y_pred = model.predict(X)

# Plotting
plt.scatter(X, y, label="True Data", s=10)
plt.plot(X, y_pred, color='red', label="Linear Fit")
plt.title("Underfitting: High Bias")
plt.legend()
plt.show()
This plot will show a straight red line trying to approximate sinusoidal data — clearly a bad fit.

Explanation:

Linear regression can't capture the curve, so the predictions are far off. The model is underfitting — it has high bias and can't capture the underlying data pattern.

Why does this happen?

➤ The model is too simple for the data. It assumes a straight-line relationship, which doesn’t exist here.

What will happen if we try this model on test data?

➤ It will perform poorly both on training and test data, because it never learned the true pattern in the first place.


Example 2: Overfitting (High Variance)

Now let’s fit a model that’s too complex — like a polynomial of degree 15 — to the same data.


from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline

# High-degree polynomial regression (degree 15)
high_variance_model = make_pipeline(PolynomialFeatures(degree=15), LinearRegression())
high_variance_model.fit(X, y)
y_pred_poly = high_variance_model.predict(X)

# Plotting
plt.scatter(X, y, label="True Data", s=10)
plt.plot(X, y_pred_poly, color='green', label="Polynomial Degree 15")
plt.title("Overfitting: High Variance")
plt.legend()
plt.show()
This plot shows a wiggly green line that tries too hard to match every point — even the noise.

Explanation:

This model performs extremely well on training data but fails on test data. It memorizes the noise, which leads to poor generalization — this is high variance.

Why is overfitting dangerous?

➤ Because your model learns noise as if it were signal, which reduces accuracy on real-world data.

How can we detect overfitting?

➤ Huge gap between training and test accuracy. Training score is high, but test score is low.


Example 3: Just Right (Balanced Bias and Variance)

Let’s now try a moderately complex model — a polynomial of degree 3 — and see how it performs.


# Polynomial regression with degree 3
just_right_model = make_pipeline(PolynomialFeatures(degree=3), LinearRegression())
just_right_model.fit(X, y)
y_pred_just_right = just_right_model.predict(X)

# Plotting
plt.scatter(X, y, label="True Data", s=10)
plt.plot(X, y_pred_just_right, color='purple', label="Polynomial Degree 3")
plt.title("Balanced Fit")
plt.legend()
plt.show()
This plot shows a smooth purple curve that captures the main pattern of the data without overreacting to noise.

Explanation:

Polynomial degree 3 captures the true sinusoidal trend without going overboard. This is an example of a model with low bias and low variance — the sweet spot.


Bias-Variance Tradeoff Curve

Visually, the tradeoff looks like this:

Too simple = high bias, too complex = high variance.


Summary Table

TypeBiasVarianceExample
UnderfittingHighLowLinear Regression on curved data
OverfittingLowHighPolynomial (degree 15) regression
BalancedLowLowPolynomial (degree 3) regression

Final Thought

Your goal is to minimize total error — which includes both bias and variance. This often requires:

In short: Not too simple, not too complex — just right.



Welcome to ProgramGuru

Sign up to start your journey with us

Support ProgramGuru.org

You can support this website with a contribution of your choice.

When making a contribution, mention your name, and programguru.org in the message. Your name shall be displayed in the sponsors list.

PayPal

UPI

PhonePe QR

MALLIKARJUNA M