Two specialised data structures that solve specific problems orders of magnitude faster than general alternatives. Heaps keep the minimum or maximum always accessible in O(1). Tries navigate prefix-based searches in O(key length). When a startup built autocomplete using a trie with O(n) traversal per keystroke, typing felt like moving through quicksand — and the feature was removed after three days.
Use heapq for top-K and priority queue problems. Implement a basic Trie with insert, search, and startsWith. Know when each data structure applies and why it is faster than the naive alternative.
Design autocomplete systems with precomputed top-K at each trie node. Know how tries are used in network routing (longest prefix match). Understand heap-based merge of K sorted arrays and its use in external sorting and distributed merge.
Two specialised data structures that solve specific problems orders of magnitude faster than general alternatives. Heaps keep the minimum or maximum always accessible in O(1). Tries navigate prefix-based searches in O(key length). When a startup built autocomplete using a trie with O(n) traversal per keystroke, typing felt like moving through quicksand — and the feature was removed after three days.
Autocomplete launched — trie built correctly but traversal is O(n) per keystroke for popular prefixes
800ms lag between keystrokes — search bar visibly trails the user's typing
CRITICALAnalytics show users abandoning search after 1-2 keystrokes — engagement drops to near zero
CRITICALFeature removed from production — users routed to browse navigation instead
WARNINGFix identified: store top-K results at each trie node during indexing. Lookup becomes O(prefix length), not O(subtree size)
The question this raises
Using the right data structure is not enough — you also need to use it in a way that takes advantage of its structure. What is the correct way to use a trie for autocomplete?
You are building a search autocomplete. As the user types each letter, you want to show the top 10 suggestions. You built the trie correctly. Which approach gives the best performance per keystroke?
Lesson outline
Heap: the always-ready top-10
Imagine tracking the 10 most expensive orders in a stream of millions. With a sorted list: each new order triggers a re-sort O(n log n). With a min-heap of size 10: compare the new order to the smallest of the current top-10 (the heap root). If the new order is larger, swap it in and re-heap in O(log 10) = O(1). The minimum of your top-10 is always at the root, ready to be compared and replaced. Processing a million orders: O(n log K) where K=10.
Heap (Priority Queue)
import heapq heap = [] heapq.heappush(heap, item) heapq.heappop(heap) # removes minimum
Use for: Reach for a heap when you need the minimum or maximum of a dynamic set, especially the top-K elements, or when you need to repeatedly extract the smallest or largest item
The key insight for heaps: you do not need full sorting to get the minimum or maximum — just maintain the heap property. Full sorting is O(n log n). Maintaining a heap where the minimum is always at the root is O(log n) per operation. If you only ever need the min or max, paying for a full sort is wasted work.
How this concept changes your thinking
I need to find the 10 most popular search terms from a stream of 10 million queries
“I collect all 10 million, sort by frequency (O(n log n)), and take the top 10”
“I use a min-heap of size 10. For each new query, if its frequency exceeds the minimum in my heap, I swap it in (O(log 10)). After 10 million queries: O(n log K) where K=10 — almost O(n).”
I need autocomplete suggestions for a search box with 500,000 indexed words
“On each keystroke I scan all words starting with the typed prefix — O(n) per character”
“I build a trie where each node stores the top-10 precomputed suggestions for its prefix. Lookup is O(prefix length) — typing "pro" traverses 3 nodes and returns the answer in 3 steps, not 500,000.”
I need to find a word in a dictionary of 100,000 entries
“Hash table: O(1) average for exact match only”
“Trie: O(word length) for exact match AND prefix search AND "does any word start with this prefix". The trie handles all three queries; the hash table handles only the first.”
Trie insert and lookup: words "cat", "car", "card", "care"
01
Start with an empty root node. The root has no character — it is the entry point.
02
Insert "cat". From root, follow (or create) the node for "c". From "c", follow "a". From "a", follow "t". Mark "t" as end-of-word. Nodes: root → c → a → t*.
03
Insert "car". From root, "c" exists. From "c", "a" exists. From "a", "r" is new — create it. Mark "r" as end-of-word. Now "a" has two children: "t" and "r".
04
Insert "card". From root, "c"→"a"→"r" all exist. "d" is new — create it. Mark "d" as end-of-word. "r" now has one child: "d".
05
Insert "care". From root, "c"→"a"→"r" exist. "e" is new — create it. Mark "e" as end-of-word. "r" now has two children: "d" and "e".
06
Lookup "car". From root, follow "c"→"a"→"r". "r" is marked end-of-word. Found. Total: 3 steps regardless of how many other words are in the trie.
07
Prefix search "car". Same traversal: "c"→"a"→"r". From "r", collect all end-of-word descendants: "car", "card", "care". The subtree rooted at "r" contains exactly the words with prefix "car".
Start with an empty root node. The root has no character — it is the entry point.
Insert "cat". From root, follow (or create) the node for "c". From "c", follow "a". From "a", follow "t". Mark "t" as end-of-word. Nodes: root → c → a → t*.
Insert "car". From root, "c" exists. From "c", "a" exists. From "a", "r" is new — create it. Mark "r" as end-of-word. Now "a" has two children: "t" and "r".
Insert "card". From root, "c"→"a"→"r" all exist. "d" is new — create it. Mark "d" as end-of-word. "r" now has one child: "d".
Insert "care". From root, "c"→"a"→"r" exist. "e" is new — create it. Mark "e" as end-of-word. "r" now has two children: "d" and "e".
Lookup "car". From root, follow "c"→"a"→"r". "r" is marked end-of-word. Found. Total: 3 steps regardless of how many other words are in the trie.
Prefix search "car". Same traversal: "c"→"a"→"r". From "r", collect all end-of-word descendants: "car", "card", "care". The subtree rooted at "r" contains exactly the words with prefix "car".
Heap for top-K elements (Python heapq) and a basic Trie implementation. These are the two patterns that appear most in interviews.
1import heapq23# HEAP: find top-K largest numbers4def top_k_largest(nums: list, k: int) -> list:5min_heap = [] # min-heap of size kA min-heap of size K naturally tracks the K largest elements: any element larger than the heap minimum (the Kth largest seen so far) displaces it. The minimum of the heap is always the Kth largest.6for num in nums:7heapq.heappush(min_heap, num) # push each number8if len(min_heap) > k:Pop AFTER pushing, not before. Push first ensures the new element is considered. If it is too small, popping removes it immediately. The heap always holds exactly K elements.9heapq.heappop(min_heap) # remove smallest if over k10return sorted(min_heap, reverse=True) # result is top-k largest1112# TRIE: insert and search13class TrieNode:14def __init__(self):Each TrieNode has a children dict (not a fixed-size array). This handles any alphabet without pre-allocating 26 slots. For a 26-character alphabet, an array is faster but uses more memory.15self.children = {} # char -> TrieNode16self.is_end = False # marks end of a word1718class Trie:19def __init__(self):20self.root = TrieNode()2122def insert(self, word: str):23node = self.root24for char in word:is_end marks where a word terminates. Without it, "car" and "card" would be indistinguishable — traversing to "r" would find the same node whether you inserted "car" or only "card".25if char not in node.children:26node.children[char] = TrieNode()27node = node.children[char]28node.is_end = True # mark end of word2930def search(self, word: str) -> bool:31node = self.root32for char in word:33if char not in node.children:34return False35node = node.children[char]36return node.is_end
For top-K problems the naive approach sorts everything. For autocomplete the naive approach scans all words per prefix. Both are correct but use O(n log n) and O(n) respectively where heaps and tries give O(n log K) and O(prefix length).
Sorting for top-K instead of using a heap
# SLOW: sort all elements to find top-K
def top_k_naive(nums, k):
nums.sort(reverse=True) # O(n log n)
return nums[:k] # O(k)
# Total: O(n log n) for any K
# At n=10,000,000 and K=10: sorts 10M elements
# to find 10. A heap does it in O(n log 10) = O(n).# FAST: maintain heap of size K
import heapq
def top_k_heap(nums, k):
return heapq.nlargest(k, nums) # O(n log k)
# Or manually:
# heap = []
# for n in nums:
# heapq.heappush(heap, n)
# if len(heap) > k:
# heapq.heappop(heap)
# return sorted(heap, reverse=True)The naive approach sorts all n elements to find K of them. For K=10 and n=10,000,000, you sort 10 million elements to answer "what are the biggest 10?" The heap approach maintains only K elements at any time, processing each new element in O(log K). For K=10, log K = ~3.3, so each element costs about 3 operations. The speedup is proportional to log(n)/log(K) — enormous when K is small.
| Scenario | What happens | Speed |
|---|---|---|
| Find minimum in a heap | Always at root — no traversal needed | Instant (O(1)) |
| Insert into a heap of n elements | Add at bottom, bubble up to correct position | Fast (O(log n)) |
| Top-K from n elements using a heap | Maintain heap of size K, process each element once | Fast (O(n log K)) |
| Insert or lookup in a trie (word length m) | Follow m nodes from root, create if missing | Fast (O(m)) |
| Autocomplete with stored top-K at each trie node | Follow prefix length nodes, read precomputed result | Very fast (O(prefix length)) |
| Autocomplete with full subtree scan per keystroke | Traverse entire subtree of current prefix node | Slow (O(subtree size)) |
Heaps & Tries
📖 What the exam expects
A heap is a complete binary tree satisfying the heap property: every parent is smaller (min-heap) or larger (max-heap) than its children. O(1) min/max access, O(log n) insert and delete. A trie stores strings character by character; common prefixes share nodes. O(m) insert and lookup where m is the key length, independent of dictionary size.
Toggle between what certifications teach and what production actually requires
Heap problems: "find the Kth largest element", "merge K sorted lists", "find the median of a data stream", "task scheduler". Trie problems: "implement autocomplete", "word search in a dictionary", "replace words with roots", "design a search autocomplete system". Heaps and tries almost never appear in the same problem — identify which one applies and why.
Common questions:
Strong answer: immediately identifies top-K as a heap problem and justifies the O(n log K) complexity / knows to negate values for max-heap in Python / implements trie with children dict not a fixed 26-element array / mentions storing precomputed top-K in trie nodes for autocomplete without being asked
Red flags: uses sorting for top-K problems (O(n log n) when O(n log K) is achievable) / builds trie but uses full subtree traversal per prefix query / confuses min-heap and max-heap (Python heapq is min-heap by default) / does not know heapq in Python or PriorityQueue in Java
💡 Analogy
A heap is like a tournament bracket where the winner (minimum or maximum) is always at the top and is immediately readable. Inserting a new player puts them at the bottom and they bubble up to their correct position. Removing the winner re-seeds the bracket in O(log n). A trie is like an autocomplete keyboard: each key you press narrows the tree to a smaller subtree. Typing "pr" takes you to a branch containing only words starting with "pr". Typing "pro" narrows further. The tree does the navigation for you based on each character.
⚡ Core Idea
A heap maintains the heap property: every parent is smaller (min-heap) or larger (max-heap) than its children. This guarantees O(1) access to the minimum or maximum at all times. A trie stores strings by their characters, one character per level. All strings sharing a prefix share the same nodes down to where they diverge. This makes prefix lookup O(key length) regardless of how many strings are stored.
🎯 Why It Matters
Heaps power every priority queue in computing: job schedulers, Dijkstra's algorithm, merge of sorted lists, finding the median in a stream of numbers. Tries power every search box you have ever used: Google's autocomplete, IDE code completion, router prefix matching, dictionary spell-check. These two structures appear in the background of almost every interactive system.
Ready to see how this works in the cloud?
Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.
View role-based pathsSign in to track your progress and mark lessons complete.
Questions? Discuss in the community or start a thread below.
Join DiscordSign in to start or join a thread.