K-Means Clustering in Machine Learning: Explained with Examples and Python Code

⬅ Previous TopicModel Evaluation Metrics in Machine Learning (with Examples & Python Code)

Next Topic ⮕Hierarchical Clustering in Machine Learning

K-Means Clustering in Machine Learning

K-Means Clustering is an unsupervised learning algorithm used to group similar data points into k number of clusters. Unlike supervised learning, K-Means doesn’t need labeled data. It tries to find natural groupings in the dataset based on feature similarity.

Real-Life Analogy

Imagine you run a shopping mall and you want to segment customers into groups based on their shopping habits — like "budget shoppers", "premium buyers", and "window shoppers". You don’t know how many categories there are, but you want the data to tell you. That’s where K-Means helps — it finds these groupings for you.

How K-Means Works: Step-by-Step

Choose the number of clusters k.
Randomly initialize k centroids (cluster centers).
Assign each data point to the nearest centroid.
Recalculate centroids as the mean of points in each cluster.
Repeat steps 3–4 until centroids do not change (or change very little).

Example: Clustering 2D Data

Let’s say we have data points representing the income and spending score of mall customers. We want to group similar customers together.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans

# Sample data: [Annual Income, Spending Score]
data = np.array([
    [15, 39], [15, 81], [16, 6], [16, 77], [17, 40],
    [17, 76], [18, 6], [18, 94], [19, 3], [19, 72],
    [20, 14], [20, 99], [21, 15], [21, 77], [23, 35],
    [23, 98], [24, 35], [24, 73], [25, 5], [25, 73]
])

# Apply KMeans with 3 clusters
kmeans = KMeans(n_clusters=3, random_state=0)
kmeans.fit(data)

# Get cluster centers and labels
centroids = kmeans.cluster_centers_
labels = kmeans.labels_

# Plot the result
plt.scatter(data[:, 0], data[:, 1], c=labels, cmap='viridis')
plt.scatter(centroids[:, 0], centroids[:, 1], c='red', marker='X', s=200, label='Centroids')
plt.xlabel('Annual Income')
plt.ylabel('Spending Score')
plt.title('Customer Segmentation using K-Means')
plt.legend()
plt.grid(True)
plt.show()

A scatter plot will be shown with 3 colored clusters and red 'X' marks as centroids.

Intuition Questions

Why do we need to specify the number of clusters k?

➤ Because K-Means is not smart enough to guess the number of groupings by itself — it needs us to tell it. However, we can try different values of k and evaluate which one works best using the Elbow Method.

How are initial centroids chosen?

➤ Randomly. That’s why different runs may give slightly different results. Using random_state helps to make the result reproducible.

What is the Elbow Method?

The Elbow Method helps us decide the right value of k (number of clusters). It plots the "Within-Cluster Sum of Squares (WCSS)" for different k values. The point where the curve bends (like an elbow) is the optimal k.

wcss = []
for k in range(1, 10):
    km = KMeans(n_clusters=k, random_state=0)
    km.fit(data)
    wcss.append(km.inertia_)

plt.plot(range(1, 10), wcss, marker='o')
plt.xlabel('Number of clusters (k)')
plt.ylabel('WCSS')
plt.title('Elbow Method for Optimal k')
plt.grid(True)
plt.show()

A line chart is plotted. The 'elbow' point shows the optimal number of clusters.

Summary

K-Means is an unsupervised algorithm that groups similar data points into k clusters.
We must choose k manually, and the Elbow Method can help in selecting the best value.
Scikit-learn’s KMeans makes implementation easy.

Practice Task for You

Use the Iris dataset from sklearn and apply K-Means clustering. Try different values of k and visualize the clusters.

⬅ Previous TopicModel Evaluation Metrics in Machine Learning (with Examples & Python Code)

Next Topic ⮕Hierarchical Clustering in Machine Learning

K-Means Clustering in Machine Learning: Explained with Examples and Python Code

K-Means Clustering in Machine Learning

Real-Life Analogy

How K-Means Works: Step-by-Step

Example: Clustering 2D Data

Intuition Questions

What is the Elbow Method?

More Intuition Questions

Summary

Practice Task for You

Module 5: Unsupervised Learning❯

K-Means Clustering in Machine Learning: Explained with Examples and Python Code

K-Means Clustering in Machine Learning

Real-Life Analogy

How K-Means Works: Step-by-Step

Example: Clustering 2D Data

Intuition Questions

What is the Elbow Method?

More Intuition Questions

Summary

Practice Task for You

Module 5: Unsupervised Learning❯

Welcome to ProgramGuru

Player Settings