Machine Learning for BeginnersMachine Learning for Beginners1

🔍

House Price Prediction Using Machine Learning (Step-by-Step Tutorial for Beginners)



House Price Prediction – Real-World ML Project

In this beginner-friendly project, we'll use machine learning to predict house prices based on various features like size, location, and number of rooms. This is one of the most popular real-world use cases of ML in action!

Problem Statement

Given a dataset of houses with features like area, number of bedrooms, number of bathrooms, location, etc., predict the selling price of a house.

Why This Project?

Dataset

We will use the California Housing dataset provided by sklearn.datasets. It’s built-in and perfect for training.

❯ Why not just use area to predict price?

🔸 Because price depends on multiple features – location, number of rooms, population density, etc.

🔹 A larger house in a poor neighborhood may be cheaper than a smaller one in a prime location.

Step-by-Step Implementation

1️⃣ Load and Explore the Dataset


from sklearn.datasets import fetch_california_housing
import pandas as pd

# Load data
data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['Target'] = data.target

# Display first few rows
print(df.head())
   MedInc  HouseAge  AveRooms  AveBedrms  Population  AveOccup  Latitude  Longitude  Target
0  8.3252      41.0  6.984127   1.023810      322.0  2.555556     37.88    -122.23   4.526
1  8.3014      21.0  6.238137   0.971880      240.0  2.109842     37.86    -122.22   3.585
...

🧠❯ What does the 'Target' column mean?

🔸 It represents the **median house price** in $100,000 units.

🔹 So a target of 4.526 means $452,600.

2️⃣ Data Preprocessing


from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X = df.drop('Target', axis=1)
y = df['Target']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Why do we scale the features?

🔸 ML models like Linear Regression perform better when features are on similar scales.

🔹 Without scaling, larger numbers may dominate the learning.

3️⃣ Train the Model


from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train_scaled, y_train)

4️⃣ Evaluate the Model


from sklearn.metrics import mean_squared_error, r2_score

predictions = model.predict(X_test_scaled)
mse = mean_squared_error(y_test, predictions)
r2 = r2_score(y_test, predictions)

print("Mean Squared Error:", mse)
print("R2 Score:", r2)
Mean Squared Error: 0.530
R2 Score: 0.61

What is R2 Score?

🔸 It indicates how well your model explains the variability of the target.

🔹 R² = 1 is perfect prediction; R² = 0 means no better than average.

Bonus: Predict on New Data


import numpy as np

sample = np.array([[8.0, 30.0, 6.0, 1.0, 300.0, 2.5, 37.85, -122.2]])
sample_scaled = scaler.transform(sample)
price = model.predict(sample_scaled)
print("Predicted Price:", round(price[0] * 100000, 2))
Predicted Price: 452378.45

Summary

Further Challenges



Welcome to ProgramGuru

Sign up to start your journey with us

Support ProgramGuru.org

You can support this website with a contribution of your choice.

When making a contribution, mention your name, and programguru.org in the message. Your name shall be displayed in the sponsors list.

PayPal

UPI

PhonePe QR

MALLIKARJUNA M