Skip to main content
Career Paths
Concepts
Dns Service Discovery
The Simplified Tech

Role-based learning paths to help you master cloud engineering with clarity and confidence.

Product

  • Career Paths
  • Interview Prep
  • Scenarios
  • AI Features
  • Cloud Comparison
  • Pricing

Community

  • Join Discord

Account

  • Dashboard
  • Credits
  • Updates
  • Sign in
  • Sign up
  • Contact Support

Stay updated

Get the latest learning tips and updates. No spam, ever.

Terms of ServicePrivacy Policy

© 2026 TheSimplifiedTech. All rights reserved.

BackBack
Interactive Explainer

DNS & Service Discovery: CoreDNS and the 5x Query Problem

CoreDNS provides Kubernetes service discovery via predictable DNS names. The default ndots:5 search path causes 5 DNS lookups per service call. At scale, this multiplies DNS traffic dramatically and introduces latency in surprising places.

Relevant for:Mid-levelSeniorStaff
Why this matters at your level
Mid-level

Know the Kubernetes DNS naming convention (service.namespace.svc.cluster.local) and how pods resolve service names using CoreDNS.

Senior

Understand ndots and the search path. Tune CoreDNS replicas and resource limits. Use autopath plugin to reduce DNS amplification. Monitor CoreDNS latency and error rates.

Staff

Design DNS strategy for high-throughput clusters: NodeLocal DNSCache to move DNS resolution off the network, FQDN usage in service code, CoreDNS custom zones for split-horizon DNS.

DNS & Service Discovery: CoreDNS and the 5x Query Problem

CoreDNS provides Kubernetes service discovery via predictable DNS names. The default ndots:5 search path causes 5 DNS lookups per service call. At scale, this multiplies DNS traffic dramatically and introduces latency in surprising places.

~3 min read
Be the first to complete!
LIVECoreDNS Overload -- ndots:5 Multiplication -- High-Throughput Cluster -- 2021
Breaking News
T+0

Traffic scales to 50k req/s; CoreDNS hits 100% CPU on all 3 pods

T+30m

DNS query latency spikes to 500ms; service-to-service latency follows

T+2h

Analysis reveals 250k DNS queries/s = 5x the actual request rate

T+3h

ndots:5 identified as root cause; 5 search domains tried per bare hostname

T+4h

Services updated to use FQDNs; DNS traffic drops to 50k/s; CoreDNS CPU normalizes

—DNS query amplification from ndots:5
—DNS latency at peak overload
—DNS queries/sec from 50k HTTP req/sec

The question this raises

What does ndots:5 actually do to DNS queries, and when does it become a performance bottleneck for high-throughput services?

Test your assumption first

A pod in namespace "payments" tries to reach a service named "auth" in namespace "identity" by calling http://auth. What happens?

Lesson outline

What CoreDNS Solves

Predictable Names for Ephemeral IPs

Pod IPs change constantly. Service ClusterIPs are stable but numeric. CoreDNS provides human-readable, predictable DNS names for Services and Pods. It watches the Kubernetes API for Service and Endpoint changes and updates its DNS records in real-time -- no static configuration required.

Service DNS

Use for: api-service.default.svc.cluster.local resolves to the Service ClusterIP. Pods in same namespace can use short name api-service. Cross-namespace calls need the full name or namespace qualifier: api-service.other-ns.

Headless Service DNS

Use for: For StatefulSets with clusterIP: None, DNS returns individual pod IPs. kafka-0.kafka-headless.default.svc.cluster.local resolves directly to kafka-0's pod IP. Used for direct pod addressing by distributed systems.

NodeLocal DNSCache

Use for: DaemonSet running a DNS cache on every node. Pods hit 169.254.20.10 (link-local) for DNS instead of CoreDNS pods over the network. Eliminates network latency for DNS. Reduces CoreDNS load by 90% on busy clusters.

The System View: DNS Resolution Path

Pod /etc/resolv.conf:
  nameserver 10.96.0.10      <- CoreDNS ClusterIP
  search default.svc.cluster.local svc.cluster.local cluster.local
  options ndots:5

Call: http://api-service (has 0 dots < ndots:5)
  Query 1: api-service.default.svc.cluster.local -> HIT (ClusterIP returned)
  [Queries 2-5 skipped because query 1 hit]

Call: http://api.external.com (has 2 dots < ndots:5)
  Query 1: api.external.com.default.svc.cluster.local -> MISS
  Query 2: api.external.com.svc.cluster.local -> MISS
  Query 3: api.external.com.cluster.local -> MISS
  Query 4: api.external.com.ec2.internal -> MISS
  Query 5: api.external.com. -> HIT (external DNS)
  5 queries for every external hostname!

Fix: use FQDN with trailing dot
  http://api-service.default.svc.cluster.local.  <- 1 query, no search
  OR: reduce ndots: 2 in pod dnsConfig

ndots:5 causes up to 5 sequential DNS queries per lookup for short names; FQDNs with trailing dot skip the search path entirely

DNS Optimization Strategies

Situation
Before
After

50k req/s cluster with default ndots:5 and short service names

“250k DNS queries/s; CoreDNS at 100% CPU; 500ms DNS latency; all service calls delayed”

“Services use FQDNs; NodeLocal DNSCache deployed; DNS traffic drops 5x to 50k/s; latency sub-millisecond”

Cross-namespace service call using short name

“http://auth fails or hits wrong service -- search path adds caller's namespace, not target namespace”

“http://auth.identity.svc.cluster.local -- explicit namespace in DNS name; always resolves correctly”

How CoreDNS Works

DNS resolution chain for a service call

→

01

1. App calls http://api-service; OS queries /etc/resolv.conf nameserver (CoreDNS or NodeLocal cache)

→

02

2. CoreDNS receives query; checks kubernetes plugin (watches Service/Endpoint objects)

→

03

3. Match found for api-service.default.svc.cluster.local; returns ClusterIP (e.g., 10.96.100.1)

→

04

4. OS returns ClusterIP to application; HTTP client connects to 10.96.100.1

→

05

5. kube-proxy iptables DNATs 10.96.100.1 to a ready pod IP

06

6. Request reaches pod; response returns along the same path

1

1. App calls http://api-service; OS queries /etc/resolv.conf nameserver (CoreDNS or NodeLocal cache)

2

2. CoreDNS receives query; checks kubernetes plugin (watches Service/Endpoint objects)

3

3. Match found for api-service.default.svc.cluster.local; returns ClusterIP (e.g., 10.96.100.1)

4

4. OS returns ClusterIP to application; HTTP client connects to 10.96.100.1

5

5. kube-proxy iptables DNATs 10.96.100.1 to a ready pod IP

6

6. Request reaches pod; response returns along the same path

pod-dns-config.yaml
1spec:
2 dnsConfig:
3 options:
4 - name: ndots
ndots: 2 -- only try search domains if hostname has fewer than 2 dots; reduces amplification
5 value: "2" # reduce from 5 to 2 -- fewer search domain attempts
6 - name: single-request-reopen
7 value: "" # avoid parallel A/AAAA query race condition
8 containers:
9 - name: app
10 env:
11 - name: API_URL
12 # Use FQDN with namespace for cross-namespace calls
Always use FQDN for cross-namespace service URLs -- prevents namespace resolution ambiguity
13 value: "http://auth-service.identity.svc.cluster.local"

What Breaks in Production: Blast Radius

DNS failure modes

  • ndots:5 amplification at scale — Default ndots:5 causes 5 sequential DNS queries per bare hostname. At 50k req/s, this generates 250k DNS queries/s. CoreDNS overloads; DNS latency spikes; all service calls slow. Fix: FQDNs or reduce ndots.
  • Cross-namespace short names resolve wrong — "http://auth" in namespace "payments" resolves auth.payments.svc.cluster.local -- potentially finding a different service or nothing. Always use full names for cross-namespace calls.
  • CoreDNS restarts cause DNS blackout — CoreDNS pods crash or restart during rolling update; DNS queries fail while no ready CoreDNS pod exists. Run at least 2-3 CoreDNS replicas with PodDisruptionBudget minAvailable: 1.
  • NXDOMAIN cached too long — Negative caching: if a Service is temporarily missing, DNS returns NXDOMAIN and the negative response is cached. Even after the Service is recreated, clients hit the cache. Default negative TTL is 30s. For dynamic services, keep the Service object persistent.

Cross-namespace call using short service name resolves to wrong namespace

Bug
# Pod in namespace: payments
# Trying to call auth service in namespace: identity

env:
- name: AUTH_SERVICE_URL
  value: "http://auth-service:8080"
  # Resolves: auth-service.payments.svc.cluster.local
  # If no auth-service in 'payments': NXDOMAIN -> connection error
  # If there IS an auth-service in 'payments': hits WRONG service!
Fix
# Always use FQDN for cross-namespace service calls
env:
- name: AUTH_SERVICE_URL
  value: "http://auth-service.identity.svc.cluster.local:8080"
  # Explicit namespace: always resolves to the right service
  # Works from any namespace regardless of caller's DNS search path

DNS search domains are scoped to the pod's namespace. Cross-namespace service calls must use the full DNS name including the target namespace. Short names only work reliably within the same namespace.

Decision Guide: DNS Optimization

Are services making more than 1000 req/s to other services?
YesDeploy NodeLocal DNSCache DaemonSet; set ndots: 2 in pod dnsConfig; use FQDNs
NoDefault CoreDNS with 2-3 replicas is sufficient
Are services calling external DNS names frequently?
YesReduce ndots to 2; external names with 1 dot currently trigger 4 search domain lookups before resolving
NoDefault ndots:5 is harmless if external DNS calls are infrequent
Do you have services in multiple namespaces that call each other?
YesEnforce FQDN usage in all inter-service URLs; document namespace-qualified names in service contracts
NoShort names within a single namespace are fine

Cost and Complexity: DNS Architecture Options

ConfigurationDNS queries per requestCoreDNS loadLatencyComplexity
Default (ndots:5, short names)1-5 per lookupHigh at scale5-50ms (network round-trip)None
FQDNs + ndots:21 per lookupLow5-10ms (network round-trip)Low (URL naming discipline)
NodeLocal DNSCache1 per lookup (cached)Very low<1ms (node-local)Medium (DaemonSet deploy)
NodeLocal + FQDNs + ndots:21 per lookup (cached)Minimal<1msMedium

Exam Answer vs. Production Reality

1 / 3

Kubernetes DNS naming scheme

📖 What the exam expects

Services: <service>.<namespace>.svc.cluster.local. Pods: <pod-ip-dashed>.<namespace>.pod.cluster.local. Short names (service) only resolve within the same namespace using search domains.

Toggle between what certifications teach and what production actually requires

How this might come up in interviews

Performance debugging questions about DNS latency and architecture questions about service discovery at scale.

Common questions:

  • How does a pod resolve the DNS name of a Service?
  • What is ndots and why does it cause extra DNS queries?
  • How would you call a service in a different namespace?
  • How would you reduce DNS load in a high-throughput Kubernetes cluster?

Strong answer: Mentions NodeLocal DNSCache for high-throughput clusters, reducing ndots in pod dnsConfig, and monitoring CoreDNS with coredns_dns_request_duration_seconds histogram.

Red flags: Not knowing that cross-namespace service calls require the full DNS name, or not being aware of the ndots amplification effect.

Related concepts

Explore topics that connect to this one.

  • Services & Endpoints: Stable Networking for Ephemeral Pods
  • kube-proxy & Service Networking
  • ServiceEntry & Egress Control

Ready to see how this works in the cloud?

Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.

View role-based paths

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord

In-app Q&A

Sign in to start or join a thread.

Sign in to track your progress and mark lessons complete.

Discussion

Questions? Discuss in the community or start a thread below.

Join Discord

In-app Q&A

Sign in to start or join a thread.