To find the median of two sorted arrays of different sizes, we need to locate the middle value(s) of their combined sorted order—but without actually merging the arrays. Instead, we use a smart technique based on binary search to find the right partitioning between the two arrays.
What is a Median?
The median is the value that separates the lower half and upper half of a dataset. For an odd total number of elements, it's the exact middle. For an even number of elements, it's the average of the two middle values.
Key Idea
We want to split both arrays into two halves—left_half
and right_half
—such that:
- The number of elements in the left half = number of elements in the right half (or one more if the total is odd)
- The maximum of the left half ≤ the minimum of the right half
Once this condition is met, the median will either be:
max(left_half)
if total length is odd
(max(left_half) + min(right_half)) / 2
if total length is even
How It Works
We apply binary search on the shorter array (for efficiency). At each step, we pick a partition index in the first array and calculate the corresponding partition in the second array to keep the total elements in left and right balanced.
We then check whether the largest element in the left part of array 1 and 2 is less than or equal to the smallest in the right part of array 1 and 2. If yes, we’ve found the correct partition.
If not, we adjust our binary search boundaries—moving left or right—until the correct split is found.
Edge Cases to Consider
- If one array is empty, the median is simply the median of the other array.
- If both arrays are empty, there is no median; it should return undefined or a suitable error.
- If arrays contain duplicate values or zeros, the logic still holds since all comparisons are value-based.
Why This Approach?
Instead of merging arrays (which takes O(n + m) time), we use binary search on the smaller array, reducing time complexity to O(log(min(n, m))). This is optimal and works well even with large arrays.
This approach is powerful, elegant, and leverages the sorted nature of both arrays to avoid extra work.