Skip to main content
Career Paths
Concepts
Dns Cdn
The Simplified Tech

Role-based learning paths to help you master cloud engineering with clarity and confidence.

Product

  • Career Paths
  • Interview Prep
  • Scenarios
  • AI Features
  • Cloud Comparison
  • Pricing

Community

  • Join Discord

Account

  • Dashboard
  • Credits
  • Updates
  • Sign in
  • Sign up
  • Contact Support

Stay updated

Get the latest learning tips and updates. No spam, ever.

Terms of ServicePrivacy Policy

© 2026 TheSimplifiedTech. All rights reserved.

BackBack
Interactive Explainer

DNS & CDN Configuration

How Route 53 routing policies, record types, and TTL strategy work in production -- and how CloudFront distributions, cache behaviours, origin failover, and invalidation costs shape CDN architecture decisions.

Relevant for:Mid-levelSeniorPrincipal
Why this matters at your level
Mid-level

Configures Route 53 records and CloudFront distributions. Knows the difference between CNAME and ALIAS. Can trace a "DNS change isn't working" issue to TTL or client-side caching. Sets cache-control headers correctly for static vs dynamic content.

Senior

Designs DNS architecture for multi-region systems including failover routing, health check configuration, and TTL strategy for planned changes. Owns CloudFront distribution design with correct cache behaviours, Origin Shield decision, and cost model. Writes runbooks for DNS failover procedures.

Principal

Sets organisation-wide DNS and CDN standards. Decides between CloudFront vs third-party CDN (Fastly, Akamai) based on cost model, feature requirements, and vendor risk. Ensures all customer-facing endpoints have tested failover paths with defined RTO/RPO.

DNS & CDN Configuration

How Route 53 routing policies, record types, and TTL strategy work in production -- and how CloudFront distributions, cache behaviours, origin failover, and invalidation costs shape CDN architecture decisions.

~6 min read
Be the first to complete!
LIVECDN Failure -- Fastly Global Outage -- June 8, 2021
Breaking News
09:47:11 UTC

A single customer changes a config setting, triggering a latent software bug in Fastly's network.

WARNING
09:47:52 UTC

Within 49 seconds, 85% of Fastly's global PoPs begin failing and returning errors.

CRITICAL
09:48

The New York Times, BBC, Reddit, GitHub, Twitch, Amazon UK, and thousands of other sites return 503 errors globally.

CRITICAL
09:58

Fastly engineers identify the root cause and begin deploying the software fix across the global network.

WARNING
10:35 UTC

Services restore globally. Total outage duration: 49 minutes.

—from config change to global failure
—of Fastly PoPs affected simultaneously
—total outage duration
—of sites offline including NYT, BBC, Reddit

The question this raises

Does your CDN configuration contain a latent bug that a single customer config change could trigger -- and would you detect it before 85% of your edge nodes were returning errors?

Test your assumption first

You want to point your root domain (example.com) directly to an Application Load Balancer. You try to create a CNAME record for example.com pointing to the ALB DNS name, but your DNS provider rejects it. Why?

Lesson outline

Why this matters

DNS for Cloud Engineers -- Beyond Name Resolution

DNS for application engineers means "it translates hostnames to IPs." DNS for cloud engineers means controlling how traffic is routed globally -- which region handles a user's request, what happens when an origin fails, and how long clients cache the answer. Route 53 routing policies make DNS a traffic management layer, not just a name lookup service.

Route 53 record types that matter

  • A record — Maps a hostname to an IPv4 address. Direct mapping, no indirection.
  • CNAME — Maps a hostname to another hostname. Cannot be used at the zone apex (root domain) -- a critical constraint that trips up almost every engineer the first time they try it.
  • ALIAS — AWS-specific extension. Maps a hostname to an AWS resource (ALB, CloudFront, S3 website) at the zone apex. Does not incur an extra DNS lookup unlike CNAME. Always use ALIAS over CNAME for AWS resources.
  • NS and SOA — Delegation records. You manage these at the registrar level when pointing a domain to Route 53.

CNAME at the zone apex is invalid -- use ALIAS

example.com is the zone apex. You cannot create a CNAME for example.com -- the DNS specification prohibits it because CNAME requires exclusive ownership of the name. You can create a CNAME for www.example.com. For the root domain pointing to an ALB or CloudFront distribution, always use an ALIAS record. AWS resolves ALIAS records server-side without the extra RTT of a CNAME chain.

Pattern

Route 53 Routing Policies -- Traffic Management at the DNS Layer

Route 53 transforms DNS from a static name-to-IP mapping into a programmable traffic management layer. The right policy depends on whether you are optimising for latency, cost, resilience, or gradual rollout.

PolicyHow it worksPrimary use caseGotcha
SimpleReturns a single record (or all values if multiple). No health check by default.Single-region, single-origin endpointsNo automatic failover -- if the target is unhealthy, clients receive the broken IP.
WeightedSplits traffic by percentage across multiple records. Weights are relative, not absolute.Canary rollouts and A/B traffic splittingA weight of 0 on a record removes it from rotation entirely, not just reduces it.
Latency-basedRoutes to the AWS region with the lowest measured latency from the user's resolver.Multi-region active-active deploymentsMeasures latency from resolver location, not user location. Corporate DNS proxies can skew results.
FailoverRoutes to primary unless primary fails health check; then routes to secondary.Active-passive disaster recoveryThe health check must be correctly configured -- failover never triggers without a passing health check.
GeolocationRoutes based on user's geographic location (country or continent).Data residency compliance and localisationRequires a default record for locations not explicitly mapped, or DNS queries fail.

Failover routing requires a correctly configured health check

A failover record without a health check never fails over -- it always serves the primary. The health check interval (10 or 30 seconds) plus the failure threshold (default 3 consecutive failures) means failover triggers after 30-90 seconds of primary failure. Set the health check endpoint to something lightweight that does not call downstream dependencies -- otherwise a slow database triggers DNS failover when the application server is actually healthy.

Pattern

CloudFront: Cache Behaviour Design

A CloudFront distribution sits in front of one or more origins (S3, ALB, API Gateway, custom HTTP server). Requests hit one of 600+ global PoPs; if the content is cached, it is served immediately. If not, the request is forwarded to the origin. The cache hit rate determines how much origin load and cost is saved.

Cache behaviour key decisions

  • Cache-Control headers — Set max-age on the origin response. CloudFront respects these by default. Static assets should be max-age=31536000 (1 year) with versioned filenames; HTML pages should be max-age=0, no-cache (always revalidate).
  • Separate behaviours for static vs dynamic — Create one behaviour for /static/* (long cache TTL, compress, no forwarding of cookies/query strings) and a separate behaviour for /api/* (no caching, forward all headers to origin). Mixing them costs either freshness or origin load.
  • Cache invalidations cost money — $0.005 per path after the first 1,000/month. Wildcard invalidation (/*) counts as one path but invalidates everything -- and does not help if content is cached at a PoP that has not yet propagated. Design for versioned filenames instead of invalidations.

Origin Shield adds a caching layer between PoPs and your origin

Without Origin Shield, each of CloudFront's 600+ PoPs can independently query your origin on a cache miss. For a low-traffic origin, this means up to 600 simultaneous requests for the same uncached object during a traffic spike. Origin Shield collapses all PoP cache misses to a single request to the shield region, then distributes the response. It adds $0.01/GB but can dramatically reduce origin load and prevent origin overload during traffic spikes.

Common Mistake

TTL Strategy -- The Hidden Cost of DNS Changes

DNS TTL (Time To Live) tells resolvers how long to cache a record. A 300-second TTL means resolvers cache the answer for 5 minutes. A 86400-second TTL means 24 hours. Most engineers set and forget TTLs at creation time. This causes "why isn't the DNS change working?" incidents.

How to execute a DNS change safely

→

01

One week before: reduce the record's TTL to 60 seconds. This ensures any cached copy expires within 1 minute.

→

02

Day of change: verify the low TTL has propagated (use dig @8.8.8.8 example.com and check the TTL in the response).

→

03

Make the change: update the DNS record to the new value.

→

04

Monitor: within 60-90 seconds, traffic should shift. Watch error rates. Be ready to revert.

05

After confirming stable: raise the TTL back to its normal value (300-3600 seconds) to reduce DNS query load.

1

One week before: reduce the record's TTL to 60 seconds. This ensures any cached copy expires within 1 minute.

2

Day of change: verify the low TTL has propagated (use dig @8.8.8.8 example.com and check the TTL in the response).

3

Make the change: update the DNS record to the new value.

4

Monitor: within 60-90 seconds, traffic should shift. Watch error rates. Be ready to revert.

5

After confirming stable: raise the TTL back to its normal value (300-3600 seconds) to reduce DNS query load.

Many clients do not respect TTL

DNS TTL is advisory -- it is the maximum time a resolver should cache the record. Java's InetAddress caches indefinitely by default (ignoring TTL) unless networkaddress.cache.ttl is set explicitly. Some ISP resolvers also ignore TTL and cache longer than specified. When changing a critical record, plan for 10-15% of traffic to still hit the old address for 30+ minutes after the TTL expires. This is why blue/green deployments keep the old origin alive until you are confident traffic has fully shifted.

Exam Answer vs. Production Reality

1 / 4

CNAME vs ALIAS

📖 What the exam expects

Both CNAME and ALIAS records map one name to another name or resource. Use CNAME for subdomains and ALIAS for root domains.

Toggle between what certifications teach and what production actually requires

How this might come up in interviews

Mid-level and senior cloud engineer interviews, solutions architect interviews, and production incident investigations. Often framed as "why did this DNS change break things?" or "design a globally distributed static site."

Common questions:

  • Why can't you use a CNAME record at the zone apex? What do you use instead?
  • Walk me through how you would safely change a DNS record in production without downtime.
  • Your CloudFront cache hit rate is 20%. What are the most likely causes and how do you fix them?
  • What is the difference between Route 53 latency routing and geolocation routing? When would you use each?
  • How does Origin Shield work and when is it worth the extra cost?

Try this question: What is your current CDN cache hit rate? Do you use versioned asset filenames or rely on invalidations for deployments? Have you tested your DNS failover routing recently?

Strong answer: Reduces TTL one week before a planned DNS change. Uses versioned filenames for all static assets. Knows the difference between the 600-second and 86400-second TTL implications. Mentions IMDSv2 or Origin Shield unprompted when discussing CloudFront.

Red flags: Uses CNAME at the zone apex without knowing why it fails. Believes DNS changes propagate instantly. Treats cache invalidation as the primary deployment mechanism. Cannot explain what a health check must test to trigger DNS failover.

Ready to see how this works in the cloud?

Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.

View role-based paths

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord

In-app Q&A

Sign in to start or join a thread.

Sign in to track your progress and mark lessons complete.

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord

In-app Q&A

Sign in to start or join a thread.