Machine Learning for BeginnersMachine Learning for Beginners1

Hierarchical Clustering in Machine Learning



What is Hierarchical Clustering?

Hierarchical Clustering is an unsupervised learning technique used to group similar data points into clusters. Unlike K-Means, it doesn't require us to pre-specify the number of clusters. Instead, it builds a hierarchy of clusters either from bottom-up (agglomerative) or top-down (divisive).

Types of Hierarchical Clustering

Real-Life Analogy

Imagine you are arranging books in a library. You first group them by genre (e.g., Fiction, Science). Then, inside Fiction, you group by author. Then by publication year. This nested grouping is similar to how hierarchical clustering works.

How Does Agglomerative Clustering Work?

  1. Treat each data point as an individual cluster.
  2. Compute the distance between all clusters.
  3. Merge the two closest clusters.
  4. Repeat steps 2–3 until all points belong to a single cluster.

But how do we visualize this?

We use something called a Dendrogram — a tree-like diagram that shows the sequence of merges or splits.


Example 1: Simple 2D Data with Dendrogram

Let’s create a small dataset and visualize hierarchical clustering using a dendrogram.


import numpy as np
import matplotlib.pyplot as plt
import scipy.cluster.hierarchy as sch

# Sample data: 2D points
X = np.array([[1, 2],
              [2, 3],
              [3, 3],
              [8, 7],
              [8, 8],
              [25, 80]])

# Plotting the Dendrogram
plt.figure(figsize=(8, 5))
dendrogram = sch.dendrogram(sch.linkage(X, method='ward'))
plt.title("Dendrogram")
plt.xlabel("Data Points")
plt.ylabel("Euclidean Distance")
plt.show()
A dendrogram is displayed showing how clusters are merged step-by-step.

🔸 Why use the 'ward' method?

It tries to minimize the variance within each cluster. This results in compact and meaningful clusters.

How do we decide the number of clusters from dendrogram?

Draw a horizontal line that cuts the largest vertical distance without intersecting a cluster. The number of vertical lines it cuts equals the number of clusters.

🔹 What’s the advantage of using a dendrogram?

It gives you a visual guide to decide the optimal number of clusters — without setting it manually like KMeans.


Example 2: Applying Hierarchical Clustering using scikit-learn

Now let’s apply Agglomerative Clustering and visualize the clusters on a 2D scatter plot.


from sklearn.cluster import AgglomerativeClustering

# Create the clustering model
model = AgglomerativeClustering(n_clusters=3, affinity='euclidean', linkage='ward')

# Fit the model and predict clusters
y_clusters = model.fit_predict(X)

# Plotting the results
plt.figure(figsize=(8,5))
plt.scatter(X[y_clusters == 0, 0], X[y_clusters == 0, 1], s=100, label='Cluster 1')
plt.scatter(X[y_clusters == 1, 0], X[y_clusters == 1, 1], s=100, label='Cluster 2')
plt.scatter(X[y_clusters == 2, 0], X[y_clusters == 2, 1], s=100, label='Cluster 3')
plt.title("Hierarchical Clustering (Agglomerative)")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.legend()
plt.show()
Three colored clusters are displayed showing grouped data points.

🔸 Explanation

Can we avoid specifying the number of clusters?

While scikit-learn requires n_clusters, we can use the dendrogram to determine it beforehand.

🔹 What’s the main difference vs KMeans?


Summary



Welcome to ProgramGuru

Sign up to start your journey with us

Support ProgramGuru.org

Mention your name, and programguru.org in the message. Your name shall be displayed in the sponsers list.

PayPal

UPI

PhonePe QR

MALLIKARJUNA M