How to diagnose performance problems with actual data: profiling, flame graphs, database query analysis, memory leak detection, and the optimization patterns that actually move the needle.
How to diagnose performance problems with actual data: profiling, flame graphs, database query analysis, memory leak detection, and the optimization patterns that actually move the needle.
Lesson outline
Donald Knuth: "Premature optimization is the root of all evil." Most code paths are not hot paths. Adding complexity to optimize code that runs 0.001% of the time is wasted effort and introduces bugs.
The correct approach: measure first, optimize the actual bottleneck. Tools show you the truth: flame graphs reveal that 80% of CPU time is in one function you never suspected. Database slow query logs show the query that runs 10,000 times/minute. Without measurement, you guess.
The 80/20 rule of performance
80% of latency comes from 20% of the code. Profile your application under realistic load, find the actual hot path, and optimize only that. Everything else is noise.
Flame graphs: Visual representation of CPU usage over time. Each horizontal bar is a function. Width = time spent. Read from bottom (outermost call) to top (innermost). The plateau at the top = where time is actually spent. Run `node --prof` then `node --prof-process` to generate V8 profiles.
APM tools: Application Performance Monitoring (Datadog, New Relic, Sentry Performance) instrument your app automatically and show transaction-level performance data, broken down by function. Essential in production.
Database slow query log: PostgreSQL `log_min_duration_statement = 100` logs all queries taking > 100ms. Analyze with `pg_stat_statements`. Find: which queries run most often? Which are slowest? Which consume most total time (count × duration)?
Memory leak detection: Node.js `--inspect` + Chrome DevTools heap snapshot. Take heap snapshot, perform actions, take another — compare. Objects that grow indicate a leak. Common culprits: global cache without eviction, event listeners never removed, closure capturing large objects.
1// Node.js Performance Profiling Setup23// 1. Enable V8 CPU profiler (development/staging)4// node --prof server.js5// node --prof-process isolate-*.log > profile.txt67// 2. In-process profiling for specific operations8import { Session } from 'inspector';9import fs from 'fs';1011async function profileFunction<T>(name: string, fn: () => Promise<T>): Promise<T> {12const session = new Session();13session.connect();1415await new Promise<void>(resolve =>16session.post('Profiler.enable', () => resolve())17);18await new Promise<void>(resolve =>19session.post('Profiler.start', () => resolve())20);2122const result = await fn();2324const profile = await new Promise<object>((resolve) =>25session.post('Profiler.stop', (err, { profile }) => resolve(profile))26);2728fs.writeFileSync(`profile-${name}-${Date.now()}.cpuprofile`, JSON.stringify(profile));29session.disconnect();30console.log(`Profile saved. Open in Chrome DevTools > Performance`);31return result;32}3334// 3. PostgreSQL query analysis35// Find the 10 most expensive queries (by total time)36const expensiveQueries = await db.execute(sql`37SELECT38query,39calls,Focus on total_exec_time (= calls × avg_ms) — a fast query called 1M times is a bottleneck40mean_exec_time::numeric(10,2) AS avg_ms,41total_exec_time::numeric(10,2) AS total_ms,42(total_exec_time / sum(total_exec_time) OVER ()) * 100 AS pct_total43FROM pg_stat_statements44ORDER BY total_exec_time DESCLog heap before/after load test — growing heap indicates a memory leak45LIMIT 1046`);47// Focus optimization on queries with highest total_ms (calls × avg_ms)4849// 4. Memory leak detection script50import v8 from 'v8';51function logHeapUsage(label: string) {52const stats = v8.getHeapStatistics();53console.log(`[${label}] heap used: ${(stats.used_heap_size / 1024 / 1024).toFixed(1)} MB`);54}
N+1 queries: Loop fetches one row, then fetches related rows for each. 100 posts → 1 + 100 queries. Fix: eager loading (JOIN), DataLoader (batch in-app). EXPLAIN ANALYZE reveals N+1: look for many similar queries with high count in pg_stat_statements.
Query plan analysis: `EXPLAIN (ANALYZE, BUFFERS)` shows the actual execution plan. Look for: Seq Scan on large tables (missing index), Nested Loop on large result sets (consider Hash Join), high Buffers hit (good — in memory) vs read (bad — hitting disk).
Connection pool tuning: Pool too small → requests queue waiting for connections → latency spike. Pool too large → PostgreSQL OOMs from too many concurrent queries. Formula: (CPU cores × 2) + disk spindles is a starting point.
Materialized views: Pre-compute expensive aggregations. A dashboard query joining 5 tables with GROUP BY might take 10 seconds. Store the result in a materialized view, refresh every 5 minutes. Dashboard query: 1ms.
1-- ✅ EXPLAIN ANALYZE: understand your query plan2EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)3SELECT o.id, o.total, u.email4FROM orders o5JOIN users u ON u.id = o.user_id6WHERE o.status = 'pending'7AND o.created_at > NOW() - INTERVAL '7 days'8ORDER BY o.created_at DESC9LIMIT 100;1011-- Look for in the output:12-- ❌ Seq Scan → add an index13-- ❌ Nested Loop with large row estimates → consider Hash Join hint14-- ✅ Index Scan → index is being used15-- ✅ Buffers: hit=1000, read=0 → data in memory (good)1617-- ✅ Composite index for this query pattern18CREATE INDEX CONCURRENTLY idx_orders_status_createdCONCURRENTLY: add index without locking the table for writes19ON orders(status, created_at DESC)20WHERE status = 'pending'; -- Partial index: only pending orders2122-- ✅ Materialized view for expensive dashboard aggregation23CREATE MATERIALIZED VIEW daily_revenue_by_region ASMaterialized view turns a 10-second query into a 1ms lookup24SELECT25DATE_TRUNC('day', created_at) AS day,26region,27SUM(total) AS revenue,28COUNT(*) AS order_count29FROM orders30WHERE status = 'completed'31GROUP BY 1, 232WITH DATA;33CONCURRENTLY: refresh without locking reads (but requires unique index)34CREATE INDEX ON daily_revenue_by_region(day, region);3536-- Refresh in background (non-blocking for reads, blocking for writes during refresh)37REFRESH MATERIALIZED VIEW CONCURRENTLY daily_revenue_by_region;3839-- ✅ pg_stat_statements: find slow queries40SELECT query, calls, mean_exec_time, total_exec_time41FROM pg_stat_statements42WHERE mean_exec_time > 100 -- Queries averaging > 100ms43ORDER BY total_exec_time DESC44LIMIT 20;
Core Web Vitals: Google's user-centric performance metrics. LCP (Largest Contentful Paint < 2.5s), INP (Interaction to Next Paint < 200ms), CLS (Cumulative Layout Shift < 0.1). These directly affect SEO ranking and user satisfaction.
Bundle size: Every KB of JavaScript delays TBT (Total Blocking Time). Analyze with `webpack-bundle-analyzer` or `vite-bundle-visualizer`. Common culprits: importing all of lodash instead of specific functions, date-fns vs moment (moment is huge), large icon libraries (import only what you use).
Image optimization: Images are often 80% of page weight. Use WebP/AVIF (50% smaller than JPEG/PNG). Use `<img loading="lazy">` for below-the-fold images. Use `srcset` for responsive images. Next.js `<Image>` handles all of this automatically.
Critical path optimization: Inline critical CSS (above-the-fold styles). Defer non-critical JS (`<script defer>`). Preconnect to third-party origins (`<link rel="preconnect">`). Preload key fonts (`<link rel="preload" as="font">`).
1// ✅ Bundle optimization: tree-shakeable imports2// ❌ Bad: imports entire lodash (72KB gzipped)Importing all of lodash adds 72KB gzipped to your bundle3import _ from 'lodash';4const debounced = _.debounce(fn, 300);56// ✅ Good: import only what you need (1KB)7import debounce from 'lodash/debounce';8const debounced = debounce(fn, 300);910// ✅ Even better: use native or lighter alternatives11const debounced = (fn: Function, delay: number) => {12let timer: NodeJS.Timeout;13return (...args: unknown[]) => {14clearTimeout(timer);15timer = setTimeout(() => fn(...args), delay);16};17};1819// ✅ Image optimization in Next.js20import Image from 'next/image';2122function HeroBanner() {23return (24<Imagepriority=true adds <link rel="preload"> — critical for LCP images25src="/hero.jpg"26alt="Hero banner"27width={1200}28height={600}29priority // Preload: above-the-fold critical image30placeholder="blur" // Show blurred placeholder while loading31quality={85} // WebP, 85% quality — good balance32/>33);34}3536// ✅ Measure Core Web Vitals in production37import { onLCP, onINP, onCLS } from 'web-vitals';Measure real user performance — lab metrics (Lighthouse) often disagree with field data3839function reportWebVitals(metric: Metric) {40// Send to your analytics41fetch('/api/metrics', {42method: 'POST',43body: JSON.stringify({44name: metric.name,45value: metric.value,46rating: metric.rating, // 'good' | 'needs-improvement' | 'poor'47url: window.location.href,48}),49});50}5152onLCP(reportWebVitals);53onINP(reportWebVitals);54onCLS(reportWebVitals);
Performance questions test whether you approach problems empirically or by intuition.
Common questions:
Strong answers include:
Red flags:
Quick check · Performance Optimization: Finding and Fixing Real Bottlenecks
1 / 1
Key takeaways
From the books
High Performance Browser Networking — Ilya Grigorik (2013)
Chapter 1: Primer on Latency and Bandwidth
Bandwidth is rarely the bottleneck. Latency — driven by physics (speed of light) and protocol overhead — is. This shapes every optimization: fewer round-trips beats more bandwidth.
Ready to see how this works in the cloud?
Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.
View role-based pathsSign in to track your progress and mark lessons complete.
Questions? Discuss in the community or start a thread below.
Join DiscordSign in to start or join a thread.