Disjoint Set (Union-Find) - Visualization

Problem Statement

The Disjoint Set (also called Union-Find) is a data structure used to manage a collection of non-overlapping sets. It supports two main operations:

  • Find: Determine which set a particular element belongs to.
  • Union: Merge two sets into a single set.

Disjoint Set is particularly useful in problems involving connected components, cycle detection in graphs, and Kruskal's algorithm for Minimum Spanning Tree.

To improve efficiency, it uses two optimizations:

  • Union by Rank or Union by Size: Always attach the smaller tree under the root of the larger tree.
  • Path Compression: Flatten the tree whenever Find is used, making future queries faster.

Examples

Operation Effect Description
find(3) Returns the representative of the set containing 3 If 3 is part of the set with root 1, find(3) returns 1
union(1, 2) Merges the sets containing 1 and 2 Uses rank/size to decide which root becomes the parent
find(3) after path compression Directly points to root Improves future queries by reducing depth
union(1, 3) May not change structure if both already connected No change if already in same set

Solution

Understanding the Problem

We are given a set of elements, and we want to manage them in groups (or sets) dynamically. Initially, each element is in its own group. Over time, we perform two kinds of operations:

  • Union: Merge two different groups into one.
  • Find: Check which group a particular element belongs to (i.e., find its leader or root).

But if we implement these operations naively, they can become slow. To speed things up, we use two clever strategies:

  • Union by Rank/Size: Always attach the smaller tree under the root of the bigger tree.
  • Path Compression: When finding the root of a node, we flatten the path so that all nodes point directly to the root.

These optimizations help us answer connectivity queries and perform unions in nearly constant time.

Step-by-Step Solution with Example

Step 1: Initialize Disjoint Set

Suppose we have 7 nodes labeled from 0 to 6. Initially, each node is its own parent. So we have:


parent = [0, 1, 2, 3, 4, 5, 6]
rank   = [0, 0, 0, 0, 0, 0, 0]

Step 2: Perform Union(1, 2)

We want to merge the sets containing nodes 1 and 2. Both have the same rank (0), so we can make 1 the parent of 2 and increase the rank of 1:


parent = [0, 1, 1, 3, 4, 5, 6]
rank   = [0, 1, 0, 0, 0, 0, 0]

Step 3: Perform Union(2, 3)

We find the parent of 2 (which is 1) and the parent of 3 (which is 3). Since rank[1] > rank[3], we attach 3 to 1:


parent = [0, 1, 1, 1, 4, 5, 6]
rank   = [0, 1, 0, 0, 0, 0, 0]

Step 4: Perform Find(3) with Path Compression

To find the root of node 3, we trace its parent chain: 3 → 1. Since 1 is the root, we directly connect 3 to 1. The structure is already flat, so no changes, but this would help in a deeper tree.

Step 5: Perform Union(4, 5)

Both are roots, so attach 5 to 4 and increase rank of 4:


parent = [0, 1, 1, 1, 4, 4, 6]
rank   = [0, 1, 0, 0, 1, 0, 0]

Step 6: Perform Union(1, 4)

Now we're merging the sets {1,2,3} and {4,5}. Their ranks are equal (1), so attach 4 under 1 and increase rank of 1:


parent = [0, 1, 1, 1, 1, 4, 6]
rank   = [0, 2, 0, 0, 1, 0, 0]

We also update 5’s parent to 4, but during future finds, it’ll be compressed to 1.

Edge Cases

  • Self-union: Calling Union(x, x) does nothing — it’s a no-op.
  • Multiple unions: Calling Union(x, y) multiple times after they are already in the same set has no effect due to rank comparison.
  • Deep trees: Without path compression, trees can get tall. Always use path compression in Find.
  • Disconnected nodes: Some nodes may never get merged — their parent remains themselves until needed.

Finally

The Disjoint Set Union-Find with path compression and union by rank is an extremely efficient data structure used in various algorithms like Kruskal's MST, cycle detection, and clustering. These optimizations make each operation run in nearly constant time, even on large datasets.

For beginners, remember that path compression flattens the structure for faster future finds, and union by rank ensures minimal tree height. Both ideas help in making this structure scalable and fast.

Algorithm Steps

  1. Initialization: Set each node as its own parent and rank/size as 1.
  2. Find(x):
    1. If x is not the parent of itself, recursively call Find on its parent.
    2. Apply path compression by setting the parent of x to the result of Find.
  3. Union(x, y):
    1. Find roots of both x and y.
    2. If they are the same, no union is needed (already connected).
    3. Else, attach the smaller rank/size tree under the larger one.
    4. If ranks are equal, arbitrarily choose one and increment its rank.

Code

C
C++
Python
Java
JS
Go
Rust
Kotlin
Swift
TS
#include <stdio.h>

#define MAX 100

int parent[MAX];
int rank_arr[MAX];

void makeSet(int n) {
    for (int i = 0; i < n; i++) {
        parent[i] = i;
        rank_arr[i] = 0;
    }
}

int find(int x) {
    if (parent[x] != x) {
        parent[x] = find(parent[x]); // Path Compression
    }
    return parent[x];
}

void unionSets(int x, int y) {
    int rootX = find(x);
    int rootY = find(y);
    if (rootX == rootY) return;

    if (rank_arr[rootX] < rank_arr[rootY]) {
        parent[rootX] = rootY;
    } else if (rank_arr[rootX] > rank_arr[rootY]) {
        parent[rootY] = rootX;
    } else {
        parent[rootY] = rootX;
        rank_arr[rootX]++;
    }
}

int main() {
    int n = 5;
    makeSet(n);

    unionSets(0, 1);
    unionSets(1, 2);

    printf("Find(2): %d\n", find(2)); // Should print 0
    printf("Find(3): %d\n", find(3)); // Should print 3
    return 0;
}

Comments

💬 Please keep your comment relevant and respectful. Avoid spamming, offensive language, or posting promotional/backlink content.
All comments are subject to moderation before being published.


Loading comments...