Machine Learning for BeginnersMachine Learning for Beginners1

Bias-Variance Tradeoff in Machine Learning (with Python Examples)



What is Bias-Variance Tradeoff?

In machine learning, we aim to build models that generalize well — that is, perform well on new, unseen data. However, every model is a balancing act between two competing sources of error:

Bias-Variance Tradeoff is the balance we must strike between underfitting (high bias) and overfitting (high variance).


Example 1: Underfitting (High Bias)

Imagine trying to fit a straight line to data that actually forms a curve. The model is too simple to capture the real pattern.


import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression

# Generating curved data
np.random.seed(1)
X = np.linspace(0, 10, 100).reshape(-1, 1)
y = np.sin(X).ravel() + np.random.normal(0, 0.1, X.shape[0])

# Fit a linear regression model
model = LinearRegression()
model.fit(X, y)
y_pred = model.predict(X)

# Plotting
plt.scatter(X, y, label="True Data", s=10)
plt.plot(X, y_pred, color='red', label="Linear Fit")
plt.title("Underfitting: High Bias")
plt.legend()
plt.show()
This plot will show a straight red line trying to approximate sinusoidal data — clearly a bad fit.

Explanation:

Linear regression can't capture the curve, so the predictions are far off. The model is underfitting — it has high bias and can't capture the underlying data pattern.

Why does this happen?

➤ The model is too simple for the data. It assumes a straight-line relationship, which doesn’t exist here.

What will happen if we try this model on test data?

➤ It will perform poorly both on training and test data, because it never learned the true pattern in the first place.


Example 2: Overfitting (High Variance)

Now let’s fit a model that’s too complex — like a polynomial of degree 15 — to the same data.


from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline

# High-degree polynomial regression (degree 15)
high_variance_model = make_pipeline(PolynomialFeatures(degree=15), LinearRegression())
high_variance_model.fit(X, y)
y_pred_poly = high_variance_model.predict(X)

# Plotting
plt.scatter(X, y, label="True Data", s=10)
plt.plot(X, y_pred_poly, color='green', label="Polynomial Degree 15")
plt.title("Overfitting: High Variance")
plt.legend()
plt.show()
This plot shows a wiggly green line that tries too hard to match every point — even the noise.

Explanation:

This model performs extremely well on training data but fails on test data. It memorizes the noise, which leads to poor generalization — this is high variance.

Why is overfitting dangerous?

➤ Because your model learns noise as if it were signal, which reduces accuracy on real-world data.

How can we detect overfitting?

➤ Huge gap between training and test accuracy. Training score is high, but test score is low.


Example 3: Just Right (Balanced Bias and Variance)

Let’s now try a moderately complex model — a polynomial of degree 3 — and see how it performs.


# Polynomial regression with degree 3
just_right_model = make_pipeline(PolynomialFeatures(degree=3), LinearRegression())
just_right_model.fit(X, y)
y_pred_just_right = just_right_model.predict(X)

# Plotting
plt.scatter(X, y, label="True Data", s=10)
plt.plot(X, y_pred_just_right, color='purple', label="Polynomial Degree 3")
plt.title("Balanced Fit")
plt.legend()
plt.show()
This plot shows a smooth purple curve that captures the main pattern of the data without overreacting to noise.

Explanation:

Polynomial degree 3 captures the true sinusoidal trend without going overboard. This is an example of a model with low bias and low variance — the sweet spot.


Bias-Variance Tradeoff Curve

Visually, the tradeoff looks like this:

Too simple = high bias, too complex = high variance.


Summary Table

TypeBiasVarianceExample
UnderfittingHighLowLinear Regression on curved data
OverfittingLowHighPolynomial (degree 15) regression
BalancedLowLowPolynomial (degree 3) regression

Final Thought

Your goal is to minimize total error — which includes both bias and variance. This often requires:

In short: Not too simple, not too complex — just right.



Welcome to ProgramGuru

Sign up to start your journey with us

Support ProgramGuru.org

Mention your name, and programguru.org in the message. Your name shall be displayed in the sponsers list.

PayPal

UPI

PhonePe QR

MALLIKARJUNA M