Machine Learning for BeginnersMachine Learning for Beginners1

Machine Learning - Linear Regression for Beginners



What is Linear Regression?

Linear Regression is one of the simplest and most widely used algorithms in machine learning. It is a supervised learning technique used for predicting a continuous value (like price, salary, temperature).

In simple terms, Linear Regression tries to draw a straight line through the data points that best represents the relationship between the input features (X) and the target variable (y).

Real-life Example:

Suppose you're a data scientist at a real estate company. You want to predict the price of a house based on its size (in square feet). By using previous house sale data (size and price), you can train a linear regression model to predict future prices.


Understanding the Formula

The equation for simple linear regression is:

y = mx + b

🧠 Question:

What does the slope tell us in real life?

Answer: It tells us how much the house price increases (or decreases) for each additional square foot in size.


Step-by-step Example

Let’s take a small dataset of house sizes and their prices:


Size (sqft): [1000, 1500, 2000, 2500, 3000]
Price ($):   [200000, 250000, 300000, 350000, 400000]

We want to build a model that can predict the price of a new house, say 2200 sqft.

Python Code

import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# Step 1: Prepare the data
X = np.array([1000, 1500, 2000, 2500, 3000]).reshape(-1, 1)  # Feature (2D)
y = np.array([200000, 250000, 300000, 350000, 400000])      # Target (1D)

# Step 2: Create and train the model
model = LinearRegression()
model.fit(X, y)

# Step 3: Make a prediction
predicted_price = model.predict([[2200]])
print(f"Predicted price for 2200 sqft: ${predicted_price[0]:.2f}")

# Step 4: Plot the data and prediction line
plt.scatter(X, y, color='blue', label='Actual Data')
plt.plot(X, model.predict(X), color='red', label='Prediction Line')
plt.xlabel("Size (sqft)")
plt.ylabel("Price ($)")
plt.title("Linear Regression: House Price Prediction")
plt.legend()
plt.grid(True)
plt.show()

Output:

Predicted price for 2200 sqft: $320000.00

Code Explanation:


Another Example: Predicting Student Scores

Suppose you want to predict the final exam score based on the number of study hours:


Hours Studied: [1, 2, 3, 4, 5]
Scores:        [20, 40, 60, 80, 100]

Python Code:

# New example: Study hours vs Scores
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
y = np.array([20, 40, 60, 80, 100])

model = LinearRegression()
model.fit(X, y)

predicted_score = model.predict([[3.5]])
print(f"Predicted score for 3.5 hours study: {predicted_score[0]:.2f}")

Output:

Predicted score for 3.5 hours study: 70.00

🧠 Question:

What would happen if a student studies for 0 hours?

Answer: The model will still predict a score based on the intercept. In this case, it might be close to 0.


When to Use Linear Regression?

⚠️ Limitations

Summary

In the next module, we’ll explore Logistic Regression, which is used for classification problems.



Welcome to ProgramGuru

Sign up to start your journey with us

Support ProgramGuru.org

Mention your name, and programguru.org in the message. Your name shall be displayed in the sponsers list.

PayPal

UPI

PhonePe QR

MALLIKARJUNA M