How networks of connections are stored and searched. The data structure behind social networks, maps, package dependency resolution, and internet routing — and why forgetting to track where you have been can make a program run forever.
Implement BFS and DFS from scratch. Know when to use each. Always include the visited set. Represent a graph as an adjacency list (a dictionary mapping each node to a list of neighbours).
Recognise graph problems in disguise: number of islands (2D grid), course schedule (directed graph cycle detection), clone a graph. Implement Dijkstra for weighted shortest paths. Know topological sort for dependency ordering.
Design systems where the core data model is a graph. Know when distributed graph databases (Neo4j, Amazon Neptune) make sense. Understand partitioning strategies for graphs that exceed single-machine memory.
How networks of connections are stored and searched. The data structure behind social networks, maps, package dependency resolution, and internet routing — and why forgetting to track where you have been can make a program run forever.
Graph traversal starts for a densely-connected user — no visited set in place
Traversal enters a cycle — re-visiting nodes, growing the queue without bound
WARNINGDatabase CPU climbs, recommendation feature unresponsive for affected users
CRITICALProcess killed and restarted — feature returns empty results rather than hang forever
WARNINGVisited set added: O(n) extra space prevents revisiting, traversal now terminates in milliseconds
The question this raises
Graphs have cycles by nature — social networks, road maps, import graphs all do. What is the one thing every graph traversal must do to terminate?
You are writing a web crawler that follows links from a starting page. Your first version works on small test sites but hangs forever on real websites. What is the most likely cause?
Lesson outline
The friend-of-friend problem
You are at a party. You know 3 people. Each of them knows 3 people. Their friends know 3 more. If you wanted to meet everyone two connections away, you visit the 3 people you know, then their 9 friends, for 12 total. You also need to remember who you have already met — or you spend the night circling back to the same conversations forever. That is graph BFS with a visited set.
BFS / DFS Graph Traversal
BFS: queue + visited set DFS: recursion/stack + visited set
Use for: Reach for BFS when you need the shortest path or nearest neighbours. Reach for DFS when you need to know whether a path exists, detect cycles, or explore all connected components
The key insight: a graph traversal without a visited set is not an algorithm — it is an infinite loop waiting to happen. Every graph that has cycles (and social networks, road maps, and import graphs all do) will trap you unless you mark what you have seen.
How this concept changes your thinking
I need to find all friends-of-friends for a user
“I loop through their friends and then their friends' friends — seems simple”
“I use a visited set. Without it, A→B→A loops forever. With it, once I visit A, I never add it to the queue again.”
I need the shortest path between two nodes
“I try DFS — go deep first until I find the target”
“I use BFS. DFS finds a path, not the shortest path. BFS explores level by level — the first time it reaches the target, that is the shortest path.”
I need to detect circular dependencies in package imports
“I am not sure if this is even a graph problem”
“Package A imports B imports C imports A — that is a cycle in a directed graph. DFS with a "currently on stack" marker detects it in O(V+E).”
BFS on a small social graph: find all friends of Alice within 2 hops
01
Start at Alice. Create a queue with [Alice] and a visited set with {Alice}. The queue tells us who to process next.
02
Process Alice. Dequeue her. Add her friends — Bob, Carol — to the queue. Mark Bob and Carol as visited. Queue: [Bob, Carol]. Visited: {Alice, Bob, Carol}.
03
Process Bob. Dequeue him. His friends are Alice (already visited — skip) and Dave (not visited — add). Queue: [Carol, Dave]. Visited: {Alice, Bob, Carol, Dave}.
04
Process Carol. Dequeue her. Her friends are Alice (visited — skip) and Eve (not visited — add). Queue: [Dave, Eve]. Visited: {Alice, Bob, Carol, Dave, Eve}.
05
Track depth. Dave and Eve are at hop 2 from Alice. We stop adding their neighbours because our limit is 2 hops. Result: {Bob, Carol, Dave, Eve}.
06
Why the visited set is essential. Without it, processing Dave would re-add Alice to the queue. Processing Alice again would re-add Bob and Carol. The queue would grow forever on any graph with cycles.
Start at Alice. Create a queue with [Alice] and a visited set with {Alice}. The queue tells us who to process next.
Process Alice. Dequeue her. Add her friends — Bob, Carol — to the queue. Mark Bob and Carol as visited. Queue: [Bob, Carol]. Visited: {Alice, Bob, Carol}.
Process Bob. Dequeue him. His friends are Alice (already visited — skip) and Dave (not visited — add). Queue: [Carol, Dave]. Visited: {Alice, Bob, Carol, Dave}.
Process Carol. Dequeue her. Her friends are Alice (visited — skip) and Eve (not visited — add). Queue: [Dave, Eve]. Visited: {Alice, Bob, Carol, Dave, Eve}.
Track depth. Dave and Eve are at hop 2 from Alice. We stop adding their neighbours because our limit is 2 hops. Result: {Bob, Carol, Dave, Eve}.
Why the visited set is essential. Without it, processing Dave would re-add Alice to the queue. Processing Alice again would re-add Bob and Carol. The queue would grow forever on any graph with cycles.
BFS in Python — find all nodes within max_depth hops from a starting node. The visited set is the one line that prevents infinite loops on any graph with cycles.
1from collections import deque23def bfs(graph: dict, start: str, max_depth: int) -> set:4visited = {start} # mark start before loopThe visited set must include start before the loop. Adding it inside the loop means start gets re-added when a neighbour points back to it.5queue = deque([(start, 0)]) # (node, current_depth)The queue stores (node, depth) tuples. Tracking depth lets you stop at max_depth without a separate level-counting mechanism.6result = set()78while queue:9node, depth = queue.popleft() # FIFO: process oldest firstdeque.popleft() is O(1). A regular list with list.pop(0) is O(n) — every element shifts. For BFS with millions of nodes, this is the difference between fast and unusable.10if depth > 0:11result.add(node)12if depth >= max_depth:13continue # do not explore beyond limit14for neighbour in graph.get(node, []):15if neighbour not in visited: # the visited checkThis visited check is the one line that prevented LinkedIn's infinite loop. Without it, the algorithm re-enqueues already-processed nodes, growing the queue without bound.16visited.add(neighbour)17queue.append((neighbour, depth + 1))1819return result
The natural first attempt skips the visited set entirely — it feels unnecessary for small test graphs. But any graph with cycles turns this into an infinite loop.
Missing visited set
# BROKEN: no visited tracking
def bfs_broken(graph, start):
queue = [start] # no visited set
result = []
while queue:
node = queue.pop(0) # also O(n) pop
result.append(node)
for n in graph.get(node, []):
queue.append(n) # re-adds already-seen nodes
return result
# On graph A<->B: runs forever# CORRECT: visited set terminates traversal
from collections import deque
def bfs_fixed(graph, start):
visited = {start} # visited before the loop
queue = deque([start]) # O(1) popleft
result = []
while queue:
node = queue.popleft()
result.append(node)
for n in graph.get(node, []):
if n not in visited: # the guard
visited.add(n)
queue.append(n)
return resultThe left version works on trees (no cycles) but hangs forever on any graph where two nodes connect back to each other. The fix is two lines: create the visited set before the loop, and check it before adding any neighbour. The visited set costs O(V) memory — a small price to guarantee termination.
| Scenario | What happens | Speed |
|---|---|---|
| BFS or DFS on a graph with V nodes, E edges | Each node visited once, each edge checked once | Fast (O(V + E)) |
| BFS without visited set on a cyclic graph | Nodes re-enqueued forever — infinite loop | Never terminates |
| Shortest path in unweighted graph | BFS guarantees shortest — first arrival is minimum hops | Fast (O(V + E)) |
| Shortest path in weighted graph (Dijkstra) | Priority queue always expands cheapest edge first | Fast (O((V + E) log V)) |
Graph Traversal
📖 What the exam expects
BFS uses a queue to explore nodes level by level — O(V+E) time, O(V) space. DFS uses a stack or recursion to explore depth-first — O(V+E) time, O(V) stack space. Both require a visited set to prevent infinite loops on cyclic graphs.
Toggle between what certifications teach and what production actually requires
Graph problems appear constantly as problems that look like something else: number of islands (2D grid is a graph), course schedule (directed graph cycle detection), word ladder (implicit graph BFS), clone a graph, network delay time (Dijkstra). The skill is recognising the underlying graph structure in a problem that does not mention "graph".
Common questions:
Strong answer: immediately identifies what the nodes and edges are in the problem / mentions visited set before writing a line of code / states BFS vs DFS choice and justifies it / knows O(V+E) without being asked / connects to real examples: social networks, maps, package dependencies
Red flags: does not mention visited set / confuses BFS (queue, level-by-level) with DFS (stack, depth-first) / cannot represent the graph as an adjacency list / applies tree traversal to a graph problem without noting the difference
💡 Analogy
Imagine a city with roads between buildings. BFS is like ripples in a pond — you explore all buildings one street away, then all buildings two streets away, and so on. DFS is like a maze explorer who always takes the first available turn, goes as deep as they can, and only backtracks when they hit a dead end. The visited set is your map — you mark every building you have been to so you never walk in circles.
⚡ Core Idea
A graph is any collection of things with connections between them. The key operations are: add a node, add an edge, and traverse. Traversal comes in two flavours: BFS (level by level, uses a queue, finds shortest path) and DFS (depth first, uses recursion or a stack, finds whether a path exists). The invariant both must maintain: never visit a node twice.
🎯 Why It Matters
Almost every relationship in computing is a graph: web pages link to each other (PageRank is graph traversal), package dependencies form a graph (npm uses DFS to detect circular imports), social connections are graphs. Understanding graphs means understanding the structure of the internet.
Ready to see how this works in the cloud?
Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.
View role-based pathsSign in to track your progress and mark lessons complete.
Questions? Discuss in the community or start a thread below.
Join DiscordSign in to start or join a thread.