The data structure that makes lookup, insertion, and deletion O(1) on average. JavaScript objects, Python dicts, Java HashMaps — all hash tables. The O(1) average hides a dangerous worst-case that Python web servers in 2012.
Know that hash maps give O(1) average lookup. Can implement Two Sum and word frequency counter. Knows the difference between a Map (any key type) and a plain object (string keys only) in JavaScript.
Reach for a hash map instinctively when the brute force has a nested loop. Know the O(n) worst case and what causes it. Understand load factor and rehashing.
Know the security implications of untrusted hash map keys (CVE-2012-1150). Design systems with appropriate input limits when hash maps process user-supplied keys. Understand hash table internals well enough to explain why Python added PYTHONHASHSEED.
Make architectural decisions about which data structures to use for key-value storage at scale (Redis HSET, Memcached, in-process map). Understand sharding and consistent hashing as extensions of the hash table concept.
The data structure that makes lookup, insertion, and deletion O(1) on average. JavaScript objects, Python dicts, Java HashMaps — all hash tables. The O(1) average hides a dangerous worst-case that Python web servers in 2012.
Crosby & Wallach publish the theoretical hash flooding attack — hash tables degrade to O(n) with crafted keys
Attack demonstrated at CCC conference against PHP, Java, Python web frameworks in real server tests
WARNINGCVE-2012-1150 filed — Python dict confirmed vulnerable. 10,000 crafted POST fields = minutes of CPU.
CRITICALPython 3.3 ships with PYTHONHASHSEED: random per-process salt making crafted collisions impossible
Django and Flask add MAX_NUM_FIELDS=1000 as defence-in-depth (reject requests with too many POST fields)
The question this raises
If the average-case O(1) of a hash table hides a worst-case O(n) that an attacker can trigger — what does that tell you about using hash tables for untrusted input?
You have a list of 1 million words and need to check if a specific word is in the list. What is the fastest data structure for this check?
Lesson outline
The classroom with 30 numbered seats
Imagine 30 numbered seats (0-29). To find any student's seat, compute: seat = studentID mod 30. Student 47 goes to seat 17. Student 78 goes to seat 18. No searching — one arithmetic operation gives you the exact address. That is a hash table. If two students map to the same seat, you chain them together at that seat (linked list per bucket). Usually seats have 0-1 students. The attacker sends 10,000 students all with IDs that compute to seat 0.
Hash Table
const map = new Map(); map.set(key, value); // O(1) map.get(key); // O(1)
Use for: Reach for this when you need O(1) lookup by key, counting occurrences, grouping items, or checking "have I seen this before"
The key insight: convert a key to a position. Array access is O(1) because a position IS a memory address. A hash table achieves O(1) key lookup by converting any key (string, number, object) to a position using arithmetic. The hash function turns an unpredictable key into a predictable index. Then it is just array access.
How this concept changes your thinking
I need to count how many times each word appears in 1 million words
“I would sort the words (O(n log n)), then count consecutive duplicates. Or scan the whole list for each unique word (O(n²)).”
“A hash map: for each word, increment map[word] by 1. O(1) per word. O(n) total. One pass, done.”
I need to find if two different words in a list are anagrams of each other
“I would compare every pair — sort each word and check if the sorted versions match. O(n² log n) total.”
“A hash map keyed by sorted-word: group words by their sorted form. Words with the same sorted form are anagrams. O(n log k) where k is average word length. One pass over all words.”
I receive 10,000 POST form fields from a user
“I parse them all into a dict. It's fast — dicts are O(1).”
“If an attacker crafts the field names to all hash to the same bucket, every lookup is O(n). I should reject requests with more than 1,000 fields (like Django does) before parsing.”
Two Sum: find two numbers that sum to 9 in [2, 7, 11, 15]
01
We want to find indices of two numbers that add to 9. Brute force checks every pair. Hash map does it in one pass.
02
Set up an empty hash map. It will store: { number: index_where_we_saw_it }.
03
See 2 (index 0): We need 9 - 2 = 7 to complete the pair. Check the map: is 7 in the map? No. Store 2: map = { 2: 0 }.
04
See 7 (index 1): We need 9 - 7 = 2. Check the map: is 2 in the map? Yes — it is at index 0. Found the pair: [0, 1]. Return.
05
We found the answer in 2 steps instead of checking all pairs. The hash map stores "what I have seen" so each new number can instantly ask "is my complement already here?"
06
The variable name for the hash map in code is usually seen or complement — seen.get(complement) is the one-line check that replaces the inner loop.
We want to find indices of two numbers that add to 9. Brute force checks every pair. Hash map does it in one pass.
Set up an empty hash map. It will store: { number: index_where_we_saw_it }.
See 2 (index 0): We need 9 - 2 = 7 to complete the pair. Check the map: is 7 in the map? No. Store 2: map = { 2: 0 }.
See 7 (index 1): We need 9 - 7 = 2. Check the map: is 2 in the map? Yes — it is at index 0. Found the pair: [0, 1]. Return.
We found the answer in 2 steps instead of checking all pairs. The hash map stores "what I have seen" so each new number can instantly ask "is my complement already here?"
The variable name for the hash map in code is usually seen or complement — seen.get(complement) is the one-line check that replaces the inner loop.
Two Sum — the canonical hash map problem in code:
1function twoSum(nums, target) {2const seen = new Map();seen stores: { number_value: array_index_where_we_saw_it }. We need the index to return the answer.34for (let i = 0; i < nums.length; i++) {5const complement = target - nums[i];For each number, compute what we would need to complete the pair. If nums[i] = 7 and target = 9, complement = 2.67if (seen.has(complement)) {O(1) lookup: has the complement already appeared earlier in the array? This replaces the inner loop.8return [seen.get(complement), i];Return the index of the complement (stored when we saw it earlier) and our current index i.9}1011seen.set(nums[i], i);Store AFTER checking — this prevents matching a number with itself (e.g. [3, 5] with target 6 should not match index 0 with itself).12}1314return null;15}
Two Sum without a hash map uses a nested loop — O(n²). The hash map version is O(n) and is the expected solution at senior level.
O(n²) nested loops vs O(n) hash map
// Brute force — O(n²)
// Works but fails timing at n > 10,000
function twoSum_slow(nums, target) {
for (let i = 0; i < nums.length; i++) {
for (let j = i + 1; j < nums.length; j++) {
if (nums[i] + nums[j] === target) {
return [i, j];
}
}
}
return null;
}// Hash map — O(n) time, O(n) space
function twoSum(nums, target) {
const seen = new Map();
for (let i = 0; i < nums.length; i++) {
const complement = target - nums[i];
if (seen.has(complement)) {
return [seen.get(complement), i];
}
seen.set(nums[i], i);
}
return null;
}The left version tries every pair: for 10,000 numbers that is 50 million comparisons. It will time out in a LeetCode judge. The right version uses the hash map as a "memory" of what has been seen. For each new number, it asks the map in O(1): "has the complement appeared before?" If yes, the pair is found. One pass. The cost: O(n) extra space for the map. The trade-off (time vs space) is almost always worth it.
Do I need a hash map?
| Operation | What happens | Speed |
|---|---|---|
| Insert a key-value pair | Hash the key, store at computed index | Instant average (O(1)) |
| Look up a key | Hash the key, go directly to that index | Instant average (O(1)) |
| Delete a key | Hash the key, remove from that index | Instant average (O(1)) |
| All keys hash to one bucket (attack) | Every lookup scans the entire bucket | Slow worst case (O(n)) |
| Table resize (rehash) | All items moved to new larger table — triggered when load factor > 0.75 | Slow but rare (O(n)), amortised O(1) |
Hash table complexity
📖 What the exam expects
Hash tables provide O(1) average time for get, set, and delete. Worst case is O(n) when all keys hash to the same bucket. Load factor triggers rehashing when it exceeds approximately 0.75.
Toggle between what certifications teach and what production actually requires
Hash maps appear in nearly every interview: Two Sum (store complements), Group Anagrams (sorted string as key), LRU Cache (hash map + doubly linked list), word frequency, longest consecutive sequence. Any problem that says "have you seen this before" or "find a pair that satisfies a condition" is likely a hash map problem.
Common questions:
Strong answer: identifies "have I seen this before?" as the hash map signal / states both time AND space complexity / knows the map could become large and mentions it / can explain the O(1) average and O(n) worst case
Red flags: reaches for a nested loop (O(n²)) when a hash map gives O(n) / does not think about space complexity of the map / cannot explain what a hash function does / does not know the difference between a hash map and a hash set
💡 Analogy
A classroom with 30 numbered seats. When a student arrives, you calculate their seat: seat = (student ID) mod 30. Student #47 sits at seat 17 (47 mod 30 = 17). No searching — you walk directly there. That is O(1) lookup. If students #47, #77, and #107 all arrive (all hash to seat 17), you have three people at one seat — a collision. Now finding a specific student at seat 17 requires checking all three. That is O(n) for that bucket. Most of the time buckets have 0-1 items. The rare case of many collisions is what the 2012 attackers manufactured deliberately.
⚡ Core Idea
A hash table is an array where keys are converted to indices by a hash function. Good hash functions distribute keys uniformly — each bucket gets roughly items/buckets values on average. Collision resolution (chaining or probing) handles the rare case where two keys map to the same index. The O(1) average assumes uniform distribution — which is guaranteed by a good hash function and broken by an adversary who controls the keys.
🎯 Why It Matters
Hash tables power nearly every O(n) solution to an O(n²) brute-force problem. "Have I seen this value before?" is O(1) with a hash set, O(n) without. "What value corresponds to this key?" is O(1) with a hash map, O(n) with a list. Knowing the hidden worst-case tells you when to add input validation — specifically, when accepting untrusted keys from the network.
Ready to see how this works in the cloud?
Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.
View role-based pathsSign in to track your progress and mark lessons complete.
Questions? Discuss in the community or start a thread below.
Join DiscordSign in to start or join a thread.