⬅ Previous Topic
Machine Learning - Linear Regression for BeginnersNext Topic ⮕
Machine Learning - K-Nearest Neighbors (KNN)⬅ Previous Topic
Machine Learning - Linear Regression for BeginnersNext Topic ⮕
Machine Learning - K-Nearest Neighbors (KNN)Logistic Regression is a supervised learning algorithm used for classification problems. Unlike Linear Regression which predicts continuous values, Logistic Regression predicts discrete outcomes — often binary (0 or 1).
Think of it like answering a Yes/No question with probability:
Suppose you're building a model to classify emails as "Spam" (1) or "Not Spam" (0). You collect features like:
If you apply Linear Regression, it may predict values like 1.2 or -0.4 — but we need a clear binary class! That’s where Logistic Regression shines by converting outputs into probabilities between 0 and 1 using the Sigmoid Function.
The logistic regression model computes:
y = 1 / (1 + e^(-z))
Where:
z = b0 + b1*x1 + b2*x2 + ... + bn*xn
y
= predicted probability (0 ≤ y ≤ 1)We then use a threshold (typically 0.5) to decide:
y ≥ 0.5
→ class 1 (e.g., Spam)y < 0.5
→ class 0 (e.g., Not Spam)Answer: Linear Regression doesn't output probabilities and is sensitive to outliers. Logistic Regression, through the sigmoid function, maps output between 0 and 1, which is perfect for classification tasks.
Let’s say you have data of students with their "Hours Studied" and whether they passed (1) or failed (0). We'll build a logistic regression model to predict the outcome.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
# Step 1: Create Dataset
data = {
"Hours_Studied": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
"Passed": [0, 0, 0, 0, 1, 1, 1, 1, 1, 1]
}
df = pd.DataFrame(data)
# Step 2: Split features and target
X = df[["Hours_Studied"]] # input feature
y = df["Passed"] # target label
# Step 3: Train/Test Split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Step 4: Train the Logistic Regression model
model = LogisticRegression()
model.fit(X_train, y_train)
# Step 5: Predict on test data
y_pred = model.predict(X_test)
# Step 6: Print performance metrics
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))
# Step 7: Plot the sigmoid curve
x_range = np.linspace(0, 11, 100)
y_prob = model.predict_proba(x_range.reshape(-1, 1))[:, 1]
plt.scatter(df["Hours_Studied"], df["Passed"], color='red', label='Actual')
plt.plot(x_range, y_prob, color='blue', label='Sigmoid Curve')
plt.xlabel("Hours Studied")
plt.ylabel("Probability of Passing")
plt.title("Logistic Regression - Exam Pass Prediction")
plt.legend()
plt.grid(True)
plt.show()
LogisticRegression()
: This is the sklearn model for binary classification.predict()
: Predicts whether a student passed (0 or 1).predict_proba()
: Gives the probability of the student passing.confusion_matrix
and classification_report
: Help us understand how well the model performs.Answer: Logistic Regression handles multiple features easily. Just use more columns in X = df[[...]])
. Scikit-learn automatically handles it internally!
In the next lesson, we’ll explore another popular classification algorithm — K-Nearest Neighbors (KNN).
⬅ Previous Topic
Machine Learning - Linear Regression for BeginnersNext Topic ⮕
Machine Learning - K-Nearest Neighbors (KNN)You can support this website with a contribution of your choice.
When making a contribution, mention your name, and programguru.org in the message. Your name shall be displayed in the sponsors list.