Machine Learning for BeginnersMachine Learning for Beginners1

Customer Segmentation Using Clustering - Machine Learning Project for Beginners



Project: Customer Segmentation Using Clustering

Customer segmentation is a fundamental marketing strategy that divides a customer base into groups of individuals with similar characteristics. In this project, we'll use K-Means Clustering—a popular unsupervised machine learning algorithm—to segment customers based on their behavior.

---

Real-Life Problem Statement

A retail store wants to segment its customers to offer personalized marketing. They have a dataset of customer details like:

By clustering customers, we can group them into categories like:

---

What is Clustering?

Clustering is an unsupervised learning technique where the goal is to group similar data points together.

K-Means Clustering partitions the dataset into k clusters, where each data point belongs to the cluster with the nearest mean.

---

What kind of learning is this?

It’s unsupervised learning, because we don’t have predefined labels. We’re discovering structure from data.

Why use clustering instead of classification?

Because we don’t have labeled output like “Customer Type A” or “Customer Type B”. We let the algorithm find natural groupings.

---

Dataset Used

We’ll use the Mall_Customers.csv dataset, which contains the following columns:

---

Step-by-Step Code with Explanations


# Step 1: Import necessary libraries
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

# Step 2: Load the dataset
df = pd.read_csv('Mall_Customers.csv')

# Step 3: Select features for clustering
X = df[['Annual Income (k$)', 'Spending Score (1-100)']]
Shape of X: (200, 2)

We choose just two features to visualize the clusters easily: Annual Income and Spending Score.

---

How to choose the number of clusters (k)?

We use the Elbow Method: plot the Within-Cluster Sum of Squares (WCSS) for different values of k. The “elbow point” indicates the optimal number of clusters.


# Step 4: Find the optimal number of clusters using Elbow Method
wcss = []
for i in range(1, 11):
    kmeans = KMeans(n_clusters=i, random_state=42)
    kmeans.fit(X)
    wcss.append(kmeans.inertia_)

# Plot the elbow graph
plt.plot(range(1, 11), wcss, marker='o')
plt.title('Elbow Method')
plt.xlabel('Number of clusters')
plt.ylabel('WCSS')
plt.grid(True)
plt.show()
The "elbow" typically appears at k = 5
---

Apply K-Means with k = 5


# Step 5: Apply K-Means with k = 5
kmeans = KMeans(n_clusters=5, random_state=42)
y_kmeans = kmeans.fit_predict(X)

y_kmeans is an array of cluster labels (0 to 4) assigned to each customer.

---

Visualize the Clusters


# Step 6: Visualize the clusters
plt.figure(figsize=(8, 5))
plt.scatter(X.iloc[y_kmeans==0, 0], X.iloc[y_kmeans==0, 1], s=100, label='Cluster 1')
plt.scatter(X.iloc[y_kmeans==1, 0], X.iloc[y_kmeans==1, 1], s=100, label='Cluster 2')
plt.scatter(X.iloc[y_kmeans==2, 0], X.iloc[y_kmeans==2, 1], s=100, label='Cluster 3')
plt.scatter(X.iloc[y_kmeans==3, 0], X.iloc[y_kmeans==3, 1], s=100, label='Cluster 4')
plt.scatter(X.iloc[y_kmeans==4, 0], X.iloc[y_kmeans==4, 1], s=100, label='Cluster 5')

# Plot centroids
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], 
            s=300, c='black', marker='X', label='Centroids')

plt.title('Customer Segments')
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score (1-100)')
plt.legend()
plt.grid(True)
plt.show()

This plot shows 5 clusters with centroids. You can visually identify:

---

Final Summary

In this project, we:

  1. Loaded and prepared customer data
  2. Used K-Means clustering to group customers
  3. Chose optimal k using the elbow method
  4. Visualized customer segments for better marketing strategy
---

What can a business do with these segments?

They can create targeted campaigns: offer luxury services to high-income/high-spend customers and loyalty programs for low-income/high-spend customers.

Can we use more features in clustering?

Absolutely! The more features you use, the better your clustering could be—though you’ll need dimensionality reduction techniques for visualization (like PCA).

Congratulations! 🎉 You've built a complete unsupervised ML project from scratch!



Welcome to ProgramGuru

Sign up to start your journey with us

Support ProgramGuru.org

Mention your name, and programguru.org in the message. Your name shall be displayed in the sponsers list.

PayPal

UPI

PhonePe QR

MALLIKARJUNA M