Machine Learning for BeginnersMachine Learning for Beginners1

Cross-Validation Techniques in Machine Learning (With Examples)



Cross-Validation Techniques

When we train a machine learning model, we typically split the data into training and test sets. But there's a challenge—how do we know if the model will generalize well to unseen data? This is where Cross-Validation comes in.

Why Use Cross-Validation?

Cross-validation helps us:

🔹 What is Cross-Validation?

It's a technique for evaluating a model by training it on different subsets of the data and validating it on the remaining parts. The idea is to rotate the training and testing portions to test the model's robustness.


1. K-Fold Cross-Validation

In K-Fold Cross-Validation:

🟢 Example: K-Fold with 5 Splits


from sklearn.model_selection import KFold
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score
import numpy as np

# Load dataset
iris = load_iris()
X = iris.data
y = iris.target

kf = KFold(n_splits=5, shuffle=True, random_state=42)
model = LogisticRegression(max_iter=200)

scores = []

for train_index, test_index in kf.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    
    model.fit(X_train, y_train)
    predictions = model.predict(X_test)
    acc = accuracy_score(y_test, predictions)
    scores.append(acc)

print("Fold Accuracies:", scores)
print("Average Accuracy:", np.mean(scores))
Fold Accuracies: [0.9667, 1.0, 0.9, 0.9333, 1.0]
Average Accuracy: 0.96

Code Explanation:

Why shuffle the data before splitting?

✔️ Because data might have patterns (e.g., sorted by class). Shuffling ensures randomness and avoids biased splits.


2. Stratified K-Fold Cross-Validation

Stratified K-Fold ensures that each fold has the same class distribution as the original dataset. This is especially useful when the dataset is imbalanced.

🟢 Example: Stratified K-Fold


from sklearn.model_selection import StratifiedKFold

skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
model = LogisticRegression(max_iter=200)

stratified_scores = []

for train_index, test_index in skf.split(X, y):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    model.fit(X_train, y_train)
    predictions = model.predict(X_test)
    acc = accuracy_score(y_test, predictions)
    stratified_scores.append(acc)

print("Stratified Fold Accuracies:", stratified_scores)
print("Average Accuracy:", np.mean(stratified_scores))
Stratified Fold Accuracies: [1.0, 0.9667, 0.9333, 0.9, 1.0]
Average Accuracy: 0.96

When should you use Stratified K-Fold instead of regular K-Fold?

✔️ When your dataset has imbalanced classes. It preserves class ratio in each fold.


3. Leave-One-Out Cross-Validation (LOOCV)

In LOOCV, we leave only one sample for testing and use all other samples for training. This is repeated for every data point.

🟢 Example: LOOCV


from sklearn.model_selection import LeaveOneOut

loo = LeaveOneOut()
model = LogisticRegression(max_iter=200)

loo_scores = []

for train_index, test_index in loo.split(X):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    model.fit(X_train, y_train)
    prediction = model.predict(X_test)
    acc = accuracy_score(y_test, prediction)
    loo_scores.append(acc)

print("LOOCV Accuracy:", np.mean(loo_scores))
LOOCV Accuracy: 0.9533

Code Explanation:

Should I always use LOOCV for best accuracy?

✖️ Not always. It's computationally expensive, especially for large datasets. Use it when data is limited and computation time is not a concern.


Conclusion

Cross-validation is essential for evaluating how well your model performs on unseen data. While K-Fold is the most commonly used, Stratified K-Fold is better for classification with class imbalance, and LOOCV is great when data is scarce.

👉 Choose your technique based on your data size, class balance, and available computational power.



Welcome to ProgramGuru

Sign up to start your journey with us

Support ProgramGuru.org

Mention your name, and programguru.org in the message. Your name shall be displayed in the sponsers list.

PayPal

UPI

PhonePe QR

MALLIKARJUNA M