What is Hierarchical Clustering?
Hierarchical Clustering is an unsupervised learning technique used to group similar data points into clusters. Unlike K-Means, it doesn't require us to pre-specify the number of clusters. Instead, it builds a hierarchy of clusters either from bottom-up (agglomerative) or top-down (divisive).
Types of Hierarchical Clustering
- Agglomerative (Bottom-Up): Start with each point as a separate cluster and merge them step-by-step.
- Divisive (Top-Down): Start with all points in one cluster and split them recursively.
Real-Life Analogy
Imagine you are arranging books in a library. You first group them by genre (e.g., Fiction, Science). Then, inside Fiction, you group by author. Then by publication year. This nested grouping is similar to how hierarchical clustering works.
How Does Agglomerative Clustering Work?
- Treat each data point as an individual cluster.
- Compute the distance between all clusters.
- Merge the two closest clusters.
- Repeat steps 2–3 until all points belong to a single cluster.
But how do we visualize this?
We use something called a Dendrogram — a tree-like diagram that shows the sequence of merges or splits.
Example 1: Simple 2D Data with Dendrogram
Let’s create a small dataset and visualize hierarchical clustering using a dendrogram.
import numpy as np
import matplotlib.pyplot as plt
import scipy.cluster.hierarchy as sch
# Sample data: 2D points
X = np.array([[1, 2],
[2, 3],
[3, 3],
[8, 7],
[8, 8],
[25, 80]])
# Plotting the Dendrogram
plt.figure(figsize=(8, 5))
dendrogram = sch.dendrogram(sch.linkage(X, method='ward'))
plt.title("Dendrogram")
plt.xlabel("Data Points")
plt.ylabel("Euclidean Distance")
plt.show()
A dendrogram is displayed showing how clusters are merged step-by-step.
🔸 Why use the 'ward' method?
It tries to minimize the variance within each cluster. This results in compact and meaningful clusters.
How do we decide the number of clusters from dendrogram?
Draw a horizontal line that cuts the largest vertical distance without intersecting a cluster. The number of vertical lines it cuts equals the number of clusters.
🔹 What’s the advantage of using a dendrogram?
It gives you a visual guide to decide the optimal number of clusters — without setting it manually like KMeans.
Example 2: Applying Hierarchical Clustering using scikit-learn
Now let’s apply Agglomerative Clustering and visualize the clusters on a 2D scatter plot.
from sklearn.cluster import AgglomerativeClustering
# Create the clustering model
model = AgglomerativeClustering(n_clusters=3, affinity='euclidean', linkage='ward')
# Fit the model and predict clusters
y_clusters = model.fit_predict(X)
# Plotting the results
plt.figure(figsize=(8,5))
plt.scatter(X[y_clusters == 0, 0], X[y_clusters == 0, 1], s=100, label='Cluster 1')
plt.scatter(X[y_clusters == 1, 0], X[y_clusters == 1, 1], s=100, label='Cluster 2')
plt.scatter(X[y_clusters == 2, 0], X[y_clusters == 2, 1], s=100, label='Cluster 3')
plt.title("Hierarchical Clustering (Agglomerative)")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.legend()
plt.show()
Three colored clusters are displayed showing grouped data points.
🔸 Explanation
n_clusters=3
: We manually choose to split the data into 3 clusters.affinity='euclidean'
: Distance metric to measure similarity.linkage='ward'
: Defines how to merge clusters (minimize variance).
Can we avoid specifying the number of clusters?
While scikit-learn requires n_clusters
, we can use the dendrogram to determine it beforehand.
🔹 What’s the main difference vs KMeans?
- KMeans needs the number of clusters up front, hierarchical doesn't.
- Hierarchical gives a hierarchy (tree), KMeans gives flat clusters.
Summary
- Hierarchical Clustering builds nested clusters using distance metrics.
- Agglomerative is most common (bottom-up).
- Dendrogram helps you visually decide cluster count.
- Scikit-learn makes it easy to implement using
AgglomerativeClustering
.