# Median of two arrays

## Getting the median of two sorted arrays in O(log(small_array_length)).

Sat Mar 14 2020

Did you know that it’s possible to get the median of two separate arrays of different lengths, even or odd, without ever merging them or sorting them? It’s something that is often asked in interviews and is also listed on Leetcode.com.

Median = the value that splits the array into two equal length partitions

## The premise

Given two separate arrays that can have different lengths, overlapping values. There exists a solution to find the median across both arrays without sorting or merging the arrays.

### Example 1 - an array that when combined has an odd length of 11

``````small_array = [1, 3, 8, 9, 15]
large_array = [7, 11, 18, 19, 21, 25]
``````

The solution should find the same median `11` as if this array was merged and sorted

``````merged_sorted_array = [1, 3, 7, 8, 9, 11, 15, 18, 19, 21, 25]
``````

Search for the first two partition points which satisfy the following conditions

1. All elements to the left of the partition on `small_array` are smaller or equal to all elements on the right of the partition of `large_array`
2. All elements to the left of the partition on `large_array` are smaller or equal to all elements on the right of the partition of `small_array`

Since both arrays are ordered in ascending order, checking all elements is unnecessary. The partition that satisfies the above on `small_array` and `large_array` is as follows.

``````1 3 8 9 | 15
7 11 | 18 19 21 25``````

`9` is less than `18` and `11` is less than `15`

Since the total number of elements in the combined array is odd we simply use the larger of the two numbers to the left of either partition as the median. This is `11`.

#### But how you do you find the partition that satisfies the above?

Since the goal of the median is to have an equal number of elements of both arrays on the left as on the right and have the median as the splitting value that divides the array equally.

This array is odd, there 6 elements on the left, 5 elements on the right therefore the median has to be one of the numbers on the left. If this number is plucked as the median there will be 5 elements on either side.

``````1 3 8 9 | 15
7 11 | 18 19 21 25``````

It’s impossible to have partitions of both arrays over to the right and find the median. Therefore it makes no sense to look in for a partition in this search space.

``````1 3 8 9 | 15
7 11 18 19 21 | 25``````

With this knowledge, it’s possible to ascertain that if you desire to move the `small_array` partition over to the right. You must move the `large_array` partition in the opposite direction to the same degree.

#### Example iterations

##### Iteration 1 - incorrect partitions
• `small_array` is partitioned at index `2` which is decided by the number of elements in the search divided by 2 as a whole number e.g. `5 / 2 = 2.5` which is just `2` when the decimal is truncated
• `large_array` is partitioned at index `4` which is decided by the total number of elements in both arrays + 1 divided by 2 minus the partition index of the `small_array` e.g. `(11 + 1) / 2 - 2 = 4`.

We compare the numbers around the current partition points to decide whether to change these partition points

``````1 3 | 8 9 15
7 11 18 19 | 21 25``````
• `3` is not less than `21`, this correct partition must be in between the higher numbers
• `19` is not less than `8`, this correct partition is in the earlier numbers
##### Iteration 2 - correct partitions

There were 3 ways of partitioning the `small_array` in this iteration

• `8 | 9`
• `9 | 15`
• `15 | +∞`

`9 | 15` was chosen as it’s in the middle. The partition index here is `4`.

However, since we moved the partition on the `small_array` forward `2` places from index `2` to `4` in this iteration. The partition on the `large_array` moved `2` places to the left, to ensure the balance of elements either side of the median.

``````1 3 8 9 | 15
7 11 | 18 19 21 25``````

### Example 2 - an array that when combined has an even length of 10

In an even array the median is the mean of the two middle numbers

``````small_array = [23, 26, 31, 35]
large_array = [3, 5, 7, 9, 11, 16]
``````

The solution should find the same median `13.5` derived from `(11 + 16) / 2` as if this array was merged and sorted

``````merged_sorted_array = [3, 5, 7, 9, 11,   16, 23, 26, 31, 35]
``````

The partition points of these arrays are

``````-∞ | 23 26 31 35
3 5 7 9 11 | 16``````

Nothing exists on the left of partition in the `small_array`, therefore we have substituted this with negative infinity `-∞`. This effectively means that all values of concern to the median are only in the `large_array` and we need both a left and right value in this array to compare.

We still satisfy the conditions earlier

1. All elements to the left of the partition on `small_array` are smaller or equal to all elements on the right of the partition of `large_array`

1. -∞ is less than all elements
2. All elements to the left of the partition on `large_array` are smaller or equal to all elements on the right of the partition of `small_array`

1. 11 is less than 23
2. 16 is also less than 23 but we satisfied both conditions already

Now the next step is to compare the 4 numbers next to the partition points

left right
`small_array` -∞ 23
`large_array` 11 16

We will use the following formula `(max(small_array_left_val, large_array_right_val) + min(small_array_right_val, large_array_right_val)) / 2` or `(11 + 16) / 2 = 13.15`

### Example 3 - an array that when combined has an odd length of 3

``````small_array = [3]
large_array = [-2, -1]
``````

There are only a few ways that tiny arrays can be partitioned and the first way gets the correct result

left right
small_array -∞ 3
large_array -1 +∞

Since the combined array is of an odd length we just use the larger of the two numbers to the left of either partition as the median this is `-1` as `-1 > -∞`

### Example 4 - an array that when combined is all the same value

``````small_array=[1,1,1,1,1]
large_array=[1,1,1,1,1,1]
``````

The median we know here is `1` just by looking at it. Since LeetCode asked for only the value and not the index this algorithm takes a few shortcuts and therefore partitions incorrectly and still produces the correct result.

``````1 1 | 1 1 1
1 1 1 1 | 1 1
``````
• The top left number is less than or equal to the bottom right number
• The bottom left number is less than or equal to the top right number
• The median is one of these four numbers
• The median is one of the two left numbers as the array is odd and there are more elements on the left
• The median is the greater of the two left numbers
• The median is 1

If the LeetCode exercise asked for both the value and the index e.g. `(value, index)` then this algorithm that we have would fail to deliver the results without adjustments.

### Time complexity

Why is this solution of time complexity of `O(log(small_array_length))`? This is because the smaller array is the only array that we are doing a binary search on and for each search on this array we are halving the search space e.g. `8 -> 4 -> 2`. You can watch a video on logarithmic complexity here

## Python Code

This code passes all tests in LeetCode. Print statements are added for tutorial purposes

``````"""
https://leetcode.com/problems/median-of-two-sorted-arrays/
"""

import sys

class Solution:
"""
Wrapper class for Leetcode solution
"""

def findMedianSortedArrays(self, nums1: [int], nums2: [int]) -> float:
"""
Parameters
----------
nums1: [int]
A sorted array of numbers
nums2: [int]
A sorted array of numbers

Returns
-------
int
The median number which partitions both arrays if they were already merged and sorted
"""

if not nums1 and not nums2:
raise "Invalid input"

# Single array (trivial algorithm)
if not nums1 and nums2:
return self._get_median_single_array(nums2)
if not nums2 and nums1:
return self._get_median_single_array(nums1)

# Get the median for two arrays, smallest array first
if len(nums1) < len(nums2):
return self._find_median_sorted_arrays(small_array=nums1, large_array=nums2)

return self._find_median_sorted_arrays(small_array=nums2, large_array=nums1)

@classmethod
def _find_median_sorted_arrays(cls, small_array: [int], large_array: [int]) -> float:
"""
Time complexity O(log(len(small_array)))

Parameters
----------
small_array: [int]
A sorted list of integers
large_array: [int]
A sorted list of integers that must be larger or the same length as small_array

Returns
-------
float
The median that splits the two arrays equally as if they were sorted and merged
"""

total_elements = len(small_array) + len(large_array)

# Begin the search within the entire range of the smaller array
start_x=0
end_x = len(small_array)

while (start_x <= end_x):

# Try and find the partition points for the x and y array such that an equal number of elements appear on both sides
partition_x = cls._get_partition_x(start_x, end_x)
partition_y = cls._get_partition_y(total_elements, partition_x)

# Find immediate element to the left and to the right of partition_x in the x array
left_val_x = small_array[partition_x-1] if partition_x > 0 else -sys.maxsize
right_val_x = small_array[partition_x] if partition_x < len(small_array) else sys.maxsize

# Find immediate element to the left and to the right of partitionY in the y array
left_val_y = large_array[partition_y-1] if partition_y > 0 else -sys.maxsize
right_val_y = large_array[partition_y] if partition_y < len(large_array) else sys.maxsize

# All values to the left on the x partition are less or equal to all the values on the right of the y partition
left_x_less_eq_to_right_y = left_val_x <= right_val_y

# All values to the left of the y partition are less or equal to all the values to the right of the x partition
left_y_less_eq_to_right_x = left_val_y <= right_val_x

# Print information about the current state
print("-- INFORMATION---")
print(f"start_x={start_x} end_x={end_x}")
print(f"partition_x={partition_x} partitionY={partition_y}")
print("-----------------")
print("The 4 values hugging the 2 partition points")
print(f"maxLeftX={left_val_x}   minRightX={right_val_x}")
print(f"maxLeftY={left_val_y}   minRightY={right_val_y}")
print(f"left_x_less_eq_to_right_y={left_x_less_eq_to_right_y}")
print(f"left_y_less_eq_to_right_x={left_y_less_eq_to_right_x}")

# Partition found where median can be calculated
if left_x_less_eq_to_right_y and left_y_less_eq_to_right_x:
print(f"Found the perfect partition indexes. partition_x={partition_x} partitionY={partition_y}")
print(f"small_array_left={small_array[0:max(0, partition_x)]} small_array_right={small_array[partition_x::]}")
print(f"large_array_left={large_array[0:max(0, partition_y)]} large_array_right={large_array[partition_y::]}")
is_even = total_elements % 2 == 0
if is_even:
median = (max(left_val_x, left_val_y) + min(right_val_x, right_val_y)) / 2
else:
median = int(max(left_val_x, left_val_y))

print(f"Found the perfect median {median}")
return median

# Move search space backward or forward
if left_val_x > right_val_y:
end_x = partition_x - 1
else:
start_x = partition_x + 1

@classmethod
def _get_partition_x(cls, start_x: int, end_x: int):
"""
Parameters
----------
start_x: int
The current start_x
end_x: int
The current end_x

Returns
-------
int
The partition index for the x array, if > len(x) then the median of both arrays is in y
"""

return int((start_x + end_x) / 2)

@classmethod
def _get_partition_y(cls, total_elements: int, partition_x: int):
"""
Parameters
----------
total_elements: int
The total number of elements in both arrays
partition_x: int
The current partition_x

Returns
-------
int
The partition point of y
"""

return int(((total_elements + 1) / 2) - partition_x)

@classmethod
def _get_median_single_array(cls, nums: list) -> float:
"""
Gets the median of the sorted array

Returns
-------
float
The median number which partitions the array such that both sides have an equal number of elements.
"""

if len(nums) == 1:
return nums[0]

median = len(nums) / 2
if not median.is_integer():
# Odd number then return the middle
return nums[int(median)]

# Even so must split
median = int(median) - 1
return (nums[median] + nums[median+1]) / 2
``````

### Sample run code for experimentation and debugging

``````def main():
""" The entry point of the Python script """

# Trying to find the median of the combined arrays
sln = Solution()

# We know the median to this is 11
# print(sln.findMedianSortedArrays(nums1=[1,3,8,9,15], nums2=[7,11,18,19,21,25]))

# We know the median to this is between 11 and 16 so 13.5
print(sln.findMedianSortedArrays(nums1=[23,26,31,35], nums2=[3,5,7,9,11,16]))

# The median here is -1
#print(sln.findMedianSortedArrays([3], [-2,-1]))

# The median here is 3
#print(sln.findMedianSortedArrays([1,2,5],[1,3,5,6]))

if __name__ == "__main__":
main()
``````