Machine Learning for BeginnersMachine Learning for Beginners1

House Price Prediction Using Machine Learning (Step-by-Step Tutorial for Beginners)



House Price Prediction – Real-World ML Project

In this beginner-friendly project, we'll use machine learning to predict house prices based on various features like size, location, and number of rooms. This is one of the most popular real-world use cases of ML in action!

Problem Statement

Given a dataset of houses with features like area, number of bedrooms, number of bathrooms, location, etc., predict the selling price of a house.

Why This Project?

Dataset

We will use the California Housing dataset provided by sklearn.datasets. It’s built-in and perfect for training.

❯ Why not just use area to predict price?

🔸 Because price depends on multiple features – location, number of rooms, population density, etc.

🔹 A larger house in a poor neighborhood may be cheaper than a smaller one in a prime location.

Step-by-Step Implementation

1️⃣ Load and Explore the Dataset


from sklearn.datasets import fetch_california_housing
import pandas as pd

# Load data
data = fetch_california_housing()
df = pd.DataFrame(data.data, columns=data.feature_names)
df['Target'] = data.target

# Display first few rows
print(df.head())
   MedInc  HouseAge  AveRooms  AveBedrms  Population  AveOccup  Latitude  Longitude  Target
0  8.3252      41.0  6.984127   1.023810      322.0  2.555556     37.88    -122.23   4.526
1  8.3014      21.0  6.238137   0.971880      240.0  2.109842     37.86    -122.22   3.585
...

🧠❯ What does the 'Target' column mean?

🔸 It represents the **median house price** in $100,000 units.

🔹 So a target of 4.526 means $452,600.

2️⃣ Data Preprocessing


from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X = df.drop('Target', axis=1)
y = df['Target']

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

Why do we scale the features?

🔸 ML models like Linear Regression perform better when features are on similar scales.

🔹 Without scaling, larger numbers may dominate the learning.

3️⃣ Train the Model


from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train_scaled, y_train)

4️⃣ Evaluate the Model


from sklearn.metrics import mean_squared_error, r2_score

predictions = model.predict(X_test_scaled)
mse = mean_squared_error(y_test, predictions)
r2 = r2_score(y_test, predictions)

print("Mean Squared Error:", mse)
print("R2 Score:", r2)
Mean Squared Error: 0.530
R2 Score: 0.61

What is R2 Score?

🔸 It indicates how well your model explains the variability of the target.

🔹 R² = 1 is perfect prediction; R² = 0 means no better than average.

Bonus: Predict on New Data


import numpy as np

sample = np.array([[8.0, 30.0, 6.0, 1.0, 300.0, 2.5, 37.85, -122.2]])
sample_scaled = scaler.transform(sample)
price = model.predict(sample_scaled)
print("Predicted Price:", round(price[0] * 100000, 2))
Predicted Price: 452378.45

Summary

Further Challenges



Welcome to ProgramGuru

Sign up to start your journey with us

Support ProgramGuru.org

Mention your name, and programguru.org in the message. Your name shall be displayed in the sponsers list.

PayPal

UPI

PhonePe QR

MALLIKARJUNA M