Understanding the Problem
We are given a set of elements, and we want to manage them in groups (or sets) dynamically. Initially, each element is in its own group. Over time, we perform two kinds of operations:
- Union: Merge two different groups into one.
- Find: Check which group a particular element belongs to (i.e., find its leader or root).
But if we implement these operations naively, they can become slow. To speed things up, we use two clever strategies:
- Union by Rank/Size: Always attach the smaller tree under the root of the bigger tree.
- Path Compression: When finding the root of a node, we flatten the path so that all nodes point directly to the root.
These optimizations help us answer connectivity queries and perform unions in nearly constant time.
Step-by-Step Solution with Example
Step 1: Initialize Disjoint Set
Suppose we have 7 nodes labeled from 0 to 6. Initially, each node is its own parent. So we have:
parent = [0, 1, 2, 3, 4, 5, 6]
rank = [0, 0, 0, 0, 0, 0, 0]
Step 2: Perform Union(1, 2)
We want to merge the sets containing nodes 1 and 2. Both have the same rank (0), so we can make 1 the parent of 2 and increase the rank of 1:
parent = [0, 1, 1, 3, 4, 5, 6]
rank = [0, 1, 0, 0, 0, 0, 0]
Step 3: Perform Union(2, 3)
We find the parent of 2 (which is 1) and the parent of 3 (which is 3). Since rank[1] > rank[3], we attach 3 to 1:
parent = [0, 1, 1, 1, 4, 5, 6]
rank = [0, 1, 0, 0, 0, 0, 0]
Step 4: Perform Find(3) with Path Compression
To find the root of node 3, we trace its parent chain: 3 → 1. Since 1 is the root, we directly connect 3 to 1. The structure is already flat, so no changes, but this would help in a deeper tree.
Step 5: Perform Union(4, 5)
Both are roots, so attach 5 to 4 and increase rank of 4:
parent = [0, 1, 1, 1, 4, 4, 6]
rank = [0, 1, 0, 0, 1, 0, 0]
Step 6: Perform Union(1, 4)
Now we're merging the sets {1,2,3} and {4,5}. Their ranks are equal (1), so attach 4 under 1 and increase rank of 1:
parent = [0, 1, 1, 1, 1, 4, 6]
rank = [0, 2, 0, 0, 1, 0, 0]
We also update 5’s parent to 4, but during future finds, it’ll be compressed to 1.
Edge Cases
- Self-union: Calling Union(x, x) does nothing — it’s a no-op.
- Multiple unions: Calling Union(x, y) multiple times after they are already in the same set has no effect due to rank comparison.
- Deep trees: Without path compression, trees can get tall. Always use path compression in Find.
- Disconnected nodes: Some nodes may never get merged — their parent remains themselves until needed.
Finally
The Disjoint Set Union-Find with path compression and union by rank is an extremely efficient data structure used in various algorithms like Kruskal's MST, cycle detection, and clustering. These optimizations make each operation run in nearly constant time, even on large datasets.
For beginners, remember that path compression flattens the structure for faster future finds, and union by rank ensures minimal tree height. Both ideas help in making this structure scalable and fast.
Comments
Loading comments...