Machine Learning for BeginnersMachine Learning for Beginners1

Understanding Overfitting vs Underfitting in Machine Learning



Overfitting vs Underfitting in Machine Learning

As a beginner, understanding overfitting and underfitting is one of the most crucial concepts in building good machine learning models. These are two major problems that affect the performance of a model on unseen data.


What is Underfitting?

Underfitting occurs when your model is too simple to learn the underlying pattern of the data. It has high bias and fails to perform well on both training and testing data.

Example: Suppose we are trying to predict house prices based on many features like size, location, number of rooms, etc., but we only use one feature — say, the number of rooms. The model might miss important signals and give inaccurate predictions.

Symptoms:

Question: Why would a model perform badly on training data if it's too simple?

Answer: Because it's not learning enough from the available features. It's like trying to fit a straight line through a complex curve.


What is Overfitting?

Overfitting happens when your model learns not only the patterns but also the noise in the training data. It performs extremely well on training data but fails on unseen test data.

Example: Suppose you allow your model to create a complex decision boundary for classifying animals, and it memorizes exact shapes of training images. But when you show a new image, it gets confused easily because it never learned to generalize.

Symptoms:

Question: Is a higher accuracy on training data always a good thing?

Answer: No. High training accuracy with low test accuracy often means the model is overfitting. It memorized instead of generalizing.


🔬 Visualization Example: Polynomial Regression

Let’s see a simple case of underfitting, good fit, and overfitting using Polynomial Regression in Python.


import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split

# Generate synthetic data
np.random.seed(0)
X = np.sort(5 * np.random.rand(80, 1), axis=0)
y = np.sin(X).ravel() + np.random.normal(0, 0.1, X.shape[0])

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

# Helper to fit and plot models
def plot_model(degree):
    poly = PolynomialFeatures(degree=degree)
    X_poly_train = poly.fit_transform(X_train)
    X_poly_test = poly.transform(X_test)

    model = LinearRegression()
    model.fit(X_poly_train, y_train)

    y_train_pred = model.predict(X_poly_train)
    y_test_pred = model.predict(X_poly_test)

    mse_train = mean_squared_error(y_train, y_train_pred)
    mse_test = mean_squared_error(y_test, y_test_pred)

    plt.scatter(X_train, y_train, color='blue', label='Train Data')
    plt.scatter(X_test, y_test, color='green', label='Test Data')
    X_plot = np.linspace(0, 5, 100).reshape(-1, 1)
    y_plot = model.predict(poly.transform(X_plot))
    plt.plot(X_plot, y_plot, color='red', label=f'Degree {degree}')
    plt.title(f'Degree={degree} | Train MSE={mse_train:.2f} | Test MSE={mse_test:.2f}')
    plt.legend()
    plt.show()

# Try different degrees
for degree in [1, 4, 15]:
    plot_model(degree)

Output:

• Degree 1 → Underfitting: Line can’t capture sine wave → High error on both training and testing data
• Degree 4 → Good fit: Curve follows sine wave smoothly → Low error on both
• Degree 15 → Overfitting: Curve wiggles too much → Low train error, high test error

Code Description

Let’s understand what’s happening:

Question: What’s a good way to detect overfitting during training?

Answer: Monitor training and validation loss. If training loss keeps decreasing but validation loss increases, it’s a sign of overfitting.


Summary

You’ll learn how to tackle overfitting and underfitting with techniques like cross-validation, regularization, and pruning in the upcoming lessons.



Welcome to ProgramGuru

Sign up to start your journey with us

Support ProgramGuru.org

Mention your name, and programguru.org in the message. Your name shall be displayed in the sponsers list.

PayPal

UPI

PhonePe QR

MALLIKARJUNA M