Yandex

Machine Learning for BeginnersMachine Learning for Beginners1

Machine Learning - Linear Regression for Beginners



What is Linear Regression?

Linear Regression is one of the simplest and most widely used algorithms in machine learning. It is a supervised learning technique used for predicting a continuous value (like price, salary, temperature).

In simple terms, Linear Regression tries to draw a straight line through the data points that best represents the relationship between the input features (X) and the target variable (y).

Real-life Example:

Suppose you're a data scientist at a real estate company. You want to predict the price of a house based on its size (in square feet). By using previous house sale data (size and price), you can train a linear regression model to predict future prices.


Understanding the Formula

The equation for simple linear regression is:

y = mx + b
  • y: Target variable (e.g., house price)
  • x: Input feature (e.g., house size)
  • m: Slope of the line (how much y changes with x)
  • b: Intercept (value of y when x = 0)

Question:

What does the slope tell us in real life?

Answer: It tells us how much the house price increases (or decreases) for each additional square foot in size.


Step-by-step Example

Let’s take a small dataset of house sizes and their prices:


Size (sqft): [1000, 1500, 2000, 2500, 3000]
Price ($):   [200000, 250000, 300000, 350000, 400000]

We want to build a model that can predict the price of a new house, say 2200 sqft.

Python Code

import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# Step 1: Prepare the data
X = np.array([1000, 1500, 2000, 2500, 3000]).reshape(-1, 1)  # Feature (2D)
y = np.array([200000, 250000, 300000, 350000, 400000])      # Target (1D)

# Step 2: Create and train the model
model = LinearRegression()
model.fit(X, y)

# Step 3: Make a prediction
predicted_price = model.predict([[2200]])
print(f"Predicted price for 2200 sqft: ${predicted_price[0]:.2f}")

# Step 4: Plot the data and prediction line
plt.scatter(X, y, color='blue', label='Actual Data')
plt.plot(X, model.predict(X), color='red', label='Prediction Line')
plt.xlabel("Size (sqft)")
plt.ylabel("Price ($)")
plt.title("Linear Regression: House Price Prediction")
plt.legend()
plt.grid(True)
plt.show()

Predicted price for 2200 sqft: $320000.00

Code Explanation:

  • Step 1: We create our input (X) and target (y) arrays. The reshape(-1, 1) converts a 1D array into a 2D column vector, as required by sklearn.
  • Step 2: We create an instance of LinearRegression() and fit it to our data.
  • Step 3: We use predict() to forecast the price for 2200 sqft.
  • Step 4: We plot the original data and the best-fit line learned by the model.

Another Example: Predicting Student Scores

Suppose you want to predict the final exam score based on the number of study hours:


Hours Studied: [1, 2, 3, 4, 5]
Scores:        [20, 40, 60, 80, 100]

Python Code:

# New example: Study hours vs Scores
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([20, 40, 60, 80, 100])

model = LinearRegression()
model.fit(X, y)

predicted_score = model.predict([[3.5]])
print(f"Predicted score for 3.5 hours study: {predicted_score[0]:.2f}")

Predicted score for 3.5 hours study: 70.00

Question:

What would happen if a student studies for 0 hours?

Answer: The model will still predict a score based on the intercept. In this case, it might be close to 0.


When to Use Linear Regression?

  • When the target is a continuous number
  • When there’s a roughly linear relationship between input and output
  • When you want interpretability (you can explain slope and intercept easily)

Limitations

  • Doesn’t work well with non-linear data
  • Very sensitive to outliers
  • Can underperform if important features are missing

Summary

  • Linear Regression fits a line between input and output
  • Scikit-learn makes it easy to implement
  • Great for getting started with predictive models

In the next module, we’ll explore Logistic Regression, which is used for classification problems.



Welcome to ProgramGuru

Sign up to start your journey with us

Support ProgramGuru.org

You can support this website with a contribution of your choice.

When making a contribution, mention your name, and programguru.org in the message. Your name shall be displayed in the sponsors list.

PayPal

UPI

PhonePe QR

MALLIKARJUNA M