Heap & Priority Queue
01 / The shared mechanism
A heap gives you cheap access to the extreme — the min or the max — while everything else stays in a loose, half-sorted shape. Three sub-patterns: one heap for top-k, two heaps for the running median, and a min-heap of source pointers for merging k sorted streams.
01 / The one-sentence essence
Split the stream in half: a max-heap holds the lower half, a min-heap holds the upper half. Keep their sizes balanced, and the median is always the top of one of them.
Problemrunning median of a streamStream[5, 2, 8, 1, 9, 3, 7]
stream
50
21
82
13
94
35
76
low — max-heap
— empty —
low = [ ]
high — min-heap
— empty —
high = [ ]
median—
Two heaps: a max-heap on the left holds the lower half, a min-heap on the right holds the upper half. The median lives on the boundary.
step
0 / 28
|low|
0
|high|
0
median
—
0 / 28
02 / The pattern signature
# two heaps — low is max-heap, high is min-heaplow, high ← max-heap[ ], min-heap[ ]for x in stream:if low.empty() or x ≤ low.top():low.push(x)else: high.push(x)# rebalance so |low| − |high| ∈ {0, 1}if |low| > |high| + 1: high.push(low.pop())elif |high| > |low|: low.push(high.pop())median ← |low| > |high| ? low.top() : (low.top() + high.top()) / 2
03 / When to recognize this pattern
"running / streaming median"
The canonical signal. Numbers arrive one at a time and you must report the median after every insert. Sorting on each query would be O(N) per step; two heaps make it O(log N).
"median of a sliding window"
Same machinery, plus lazy deletion of the element leaving the window — usually with a hash map of pending removals and a delayed cleanup at each heap top.
"partition into two halves"
Any problem where the answer is determined by the boundary between the lower-than-x and higher-than-x parts of the data — schedule a doctor, balance load, find the middle of two sorted arrays.
"sum / mean of the smaller half"
The max-heap of the lower half is itself a useful object: its size and the sum of its elements give you the mean of the bottom k for free.
04 / Common pitfalls
i.
Forgetting to rebalance after every insert.
The size invariant
|low| − |high| ∈ {0, 1} only holds if you fix it immediately. Skip a rebalance and the median formula reads from the wrong heap forever after.ii.
Choosing which heap to insert into by size instead of value.
The comparison against the current low-half top is what keeps the partition ordered. Inserting blindly into the smaller heap will violate the invariant
max(low) ≤ min(high) and your median will be wrong.iii.
Integer overflow / off-by-one on the even-size median.
When both heaps have the same size the median is
(low.top() + high.top()) / 2. In integer types the sum can overflow; compute with the wider type (or low.top() + (high.top() − low.top()) / 2 with a parity correction).05 / Go practice — on LeetCode
Four problems. The first is the canonical formulation. The next two add a window and a sliding average. The last is a classic that's a two-heaps solution in disguise.
— / When to use which
top-k
Use when you need the k most extreme values — not the full sorted order.
stream · O(N log k) · LC 215 / 347 / 703
two heaps
Use when you need the running median — split the data into a low half (max-heap) and a high half (min-heap).
median · max-heap + min-heap · LC 295
k-way merge
Use when you have k sorted streams and need the next-smallest across all of them.
merge · (val, src, idx) · LC 23 / 378