Find Median From Data Stream

Question

https://leetcode.com/problems/find-median-from-data-stream/description/

The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value, and the median is the mean of the two middle values.

  • For example, for arr = [2,3,4], the median is 3.
  • For example, for arr = [2,3], the median is (2 + 3) / 2 = 2.5.

Implement the MedianFinder class:

  • MedianFinder() initializes the MedianFinder object.
  • void addNum(int num) adds the integer num from the data stream to the data structure.
  • double findMedian() returns the median of all elements so far. Answers within 10-5 of the actual answer will be accepted.

Example 1:

Input
["MedianFinder", "addNum", "addNum", "findMedian", "addNum", "findMedian"]
[[], [1], [2], [], [3], []]
Output
[null, null, null, 1.5, null, 2.0]

Explanation
MedianFinder medianFinder = new MedianFinder();
medianFinder.addNum(1);    // arr = [1]
medianFinder.addNum(2);    // arr = [1, 2]
medianFinder.findMedian(); // return 1.5 (i.e., (1 + 2) / 2)
medianFinder.addNum(3);    // arr[1, 2, 3]
medianFinder.findMedian(); // return 2.0

Solution

Intuition

Sorting the array

For this question, it's given a stream (array) of elements and asking us to find the median (the middle number of a sorted array) if the array is even then the median is the average.

So for example if we have in the stream these following numbers

[4,-3,2,0]

Then when it's sorted it becomes

[-3,0,2,4]

Therefore, the median is $\frac{0 + 2}{2} = 1$

In another case when we have 5 numbers

[4,-3,2,0,1]

Then when it's sorted it becomes

[-3,0,1,2,4]

then the median is $1$

Using Heap

Using the heap, we can seperate into 2 sorted array on two side.

When we add into this heap:

  • If both heap is empty or rightHeap is empty, we add into the left heap (just imagine you fill in the array, you gotta fill from the left)
  • If the element is larger than the smallest value of the rightHeap, we add into rightHeap

Pasted image 20221227191913.png

To balance we can do

# if len(largeHeap) > len(smallHeap) + 1
heappush(largeHeap, heappop(smallHeap))

# if len(smallHeap) > len(largeHeap) + 1
heappush(smallHeap, heappop(largeHeap))

Then to find the solution, we have 3 scenario

  1. If len(largeHeap) > len(smallHeap) return largeHeap[0] (smallest element of largeHeap)
  2. if len(smallHeap) > len(largeHeap) return smallHeap[0] (largest element of smallHeap)
  3. if len(smallHeap) == len(largeHeap) return $\frac{smallHeap[0] + largeHeap[0]}{2}$

Implementation

import unittest
from typing import *
from heapq import heappush, heappop

class MedianFinder:
    def __init__(self):
        self.leftHeap = []
        self.rightHeap = []
        
    def addNum(self, num: int) -> None:
        if self.rightHeap and self.rightHeap[0] < num:
            heappush(self.rightHeap, num)
        else:
            # Add to left by default
            heappush(self.leftHeap, num * -1)
        
        self.rebalance()

    def findMedian(self) -> float:
        if len(self.leftHeap) == len(self.rightHeap):
            return (-self.leftHeap[0] + self.rightHeap[0]) / 2
        
        if len(self.rightHeap) > len(self.leftHeap):
            return self.rightHeap[0]
        return -self.leftHeap[0]

    def rebalance(self) -> None:
        if len(self.leftHeap) > len(self.rightHeap) + 1:
            leftValue = heappop(self.leftHeap) * -1
            heappush(self.rightHeap, leftValue)
        
        if len(self.rightHeap) > len(self.leftHeap) + 1:
            rightValue = heappop(self.rightHeap)
            heappush(self.leftHeap, rightValue * -1)

Time complexity: $O(logn)$

  • We do a $logn$ rebalance

Space complexity: $O(n)$ — given $n$ is the number of input

  • We have 2 heaps sum up to $n$ size