Machine Learning for BeginnersMachine Learning for Beginners1

Hyperparameter Tuning in Machine Learning



What is Hyperparameter Tuning?

In machine learning, a hyperparameter is a configuration that is set before the training process begins. These are not learned from the data but control the learning process itself. Examples include:

Hyperparameter tuning is the process of choosing the best combination of these settings to improve the performance of a model.

🔸 Why can't we just use default hyperparameters?

Default values may work fine, but they’re generic. Tuning helps you squeeze more accuracy from your model for your specific dataset.

🔹 Example 1: Tuning KNeighborsClassifier using GridSearchCV

Let’s tune the number of neighbors (n_neighbors) in a K-Nearest Neighbors classifier.


from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV, train_test_split

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# KNN with GridSearch
param_grid = {
    'n_neighbors': [1, 3, 5, 7, 9]
}

grid_search = GridSearchCV(KNeighborsClassifier(), param_grid, cv=5)
grid_search.fit(X_train, y_train)

print("Best parameters:", grid_search.best_params_)
print("Best score:", grid_search.best_score_)
Best parameters: {'n_neighbors': 3}
Best score: 0.9619

Explanation

✦ Question: Why do we use cross-validation instead of testing on test data directly?

Answer: Because test data should be untouched until final evaluation. Cross-validation ensures the model generalizes well before we use the test set.


🔹 Example 2: Tuning Decision Tree using multiple hyperparameters

In a decision tree, some important hyperparameters are max_depth and min_samples_split.


from sklearn.tree import DecisionTreeClassifier

param_grid = {
    'max_depth': [3, 5, 7, None],
    'min_samples_split': [2, 5, 10]
}

grid_search = GridSearchCV(DecisionTreeClassifier(), param_grid, cv=5)
grid_search.fit(X_train, y_train)

print("Best parameters:", grid_search.best_params_)
print("Best score:", grid_search.best_score_)
Best parameters: {'max_depth': 3, 'min_samples_split': 2}
Best score: 0.9428

Explanation

✦ Question: What happens if we don’t restrict the depth of a decision tree?

Answer: The tree will grow deep and might overfit the training data, performing poorly on unseen data.


🔹 RandomizedSearchCV vs GridSearchCV

GridSearchCV tries all combinations exhaustively. This is great for small search spaces but becomes slow with many parameters.

RandomizedSearchCV tries only a fixed number of random combinations, making it faster.

🔸 Use Case: RandomizedSearchCV for Random Forest


from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import RandomizedSearchCV
import numpy as np

param_dist = {
    'n_estimators': [50, 100, 150, 200],
    'max_depth': [None, 10, 20, 30],
    'min_samples_split': [2, 5, 10]
}

random_search = RandomizedSearchCV(RandomForestClassifier(), param_distributions=param_dist, n_iter=5, cv=3, random_state=42)
random_search.fit(X_train, y_train)

print("Best parameters:", random_search.best_params_)
print("Best score:", random_search.best_score_)
Best parameters: {'n_estimators': 200, 'min_samples_split': 2, 'max_depth': 30}
Best score: 0.9619

Explanation

✦ Question: Should we always use RandomizedSearch for big models?

Answer: Yes, especially when you have many hyperparameters. It’s efficient and provides near-optimal results faster than grid search.


Final Tips for Hyperparameter Tuning

Summary

Hyperparameter tuning is a critical step to improve your machine learning model’s performance. It helps find the optimal settings that generalize well on new data.

Mastering tools like GridSearchCV and RandomizedSearchCV will make your ML workflow robust and production-ready.



Welcome to ProgramGuru

Sign up to start your journey with us

Support ProgramGuru.org

Mention your name, and programguru.org in the message. Your name shall be displayed in the sponsers list.

PayPal

UPI

PhonePe QR

MALLIKARJUNA M