Resource requests guide scheduling. Resource limits enforce runtime caps. Misconfiguring them leads to CPU throttling under load, OOM kills that bypass graceful shutdown, and bin-packing failures that leave nodes half-empty.
Understand requests (scheduling) vs limits (enforcement). Know CPU throttle vs OOM Kill behavior and exit codes.
Tune requests and limits based on profiled p95/p99 usage. Configure QoS classes. Use LimitRange for namespace defaults and ResourceQuota for namespace caps.
Design resource quota strategies for multi-tenant clusters. Use VPA for automated right-sizing. Understand CPU manager policies for latency-sensitive workloads requiring exclusive CPUs.
Resource requests guide scheduling. Resource limits enforce runtime caps. Misconfiguring them leads to CPU throttling under load, OOM kills that bypass graceful shutdown, and bin-packing failures that leave nodes half-empty.
CPU limits set to 250m uniformly for all Java services to "limit noisy neighbors"
Market open; request rate 10x; GC frequency increases; throttle cascade begins
p99 latency spikes to 800ms (SLO: 100ms); on-call paged
Profiling identifies CPU throttle rate at 42% for all Java services
CPU limits raised to 2000m; throttle rate drops below 1%; latency restored
The question this raises
How does the Linux CFS CPU throttle mechanism work, and why does setting CPU limits too low punish bursty workloads like JVM GC even when the node has free cores?
A pod has cpu request=100m, limit=200m. The node has 2 free cores. Under high load, CPU usage hits 200m consistently. A JVM GC burst needs 500m for 50ms. What happens?
Lesson outline
Scheduling vs Runtime Enforcement
Requests answer the scheduling question: "where can this pod fit?" Limits answer the runtime question: "how much resource can this pod consume before we intervene?" They serve different purposes and should be set independently based on profiled usage patterns -- not set equal by default.
Guaranteed QoS
Use for: requests == limits for both CPU and memory. Pod is the last to be evicted under node pressure. Use for databases, Kafka brokers, and any workload where eviction causes data loss or long recovery time.
Burstable QoS
Use for: requests < limits. Pod can burst above request when node has spare capacity. Evicted before Guaranteed under pressure. Use for web servers and APIs that need burst headroom for traffic spikes.
No CPU limit (with request)
Use for: For latency-sensitive JVM or Go services with bursty GC. Set request for scheduling; omit limit so GC can burst without throttle. Requires monitoring to prevent runaway consumption.
CPU Limit = 250m (0.25 cores) CFS period = 100ms CFS quota = 25ms (250m * 100ms / 1000m) Timeline for JVM GC burst (needs 500m for 50ms): [ 0ms] Container starts GC; kernel tracks CPU usage [ 25ms] CPU quota exhausted (25ms of CPU time used) [ 25ms] CFS SUSPENDS container <- all threads frozen [100ms] New 100ms period; quota refilled to 25ms [100ms] Container resumes GC [125ms] Quota exhausted again; suspended again [200ms] GC finally completes (50ms of work took 200ms elapsed) Node has 2 free cores? Irrelevant. CPU limit is enforced unconditionally by cgroup cpu.cfs_quota_us. The kernel does not check node utilization when throttling. Detect: container_cpu_cfs_throttled_seconds_total (Prometheus) Alert when throttle_rate > 5% for latency-sensitive services
CFS enforces CPU limits per 100ms period; bursty workloads pay a suspension tax regardless of available node capacity
Right-Sizing Resource Configuration
Java service with 250m CPU limit and frequent GC
“42% CPU throttle rate; 85ms forced suspension per GC; p99 latency 800ms during market open”
“CPU limit raised to 2000m (matching GC burst need); throttle rate below 1%; p99 latency 12ms”
Memory limit set below observed peak
“Pod OOM killed (exit code 137) during traffic spike; no graceful shutdown; in-flight requests dropped”
“Memory limit set to p99 usage + 20% headroom; OOMKilled events drop to zero; limit based on profiling not guessing”
From resource spec to cgroup enforcement
01
1. Pod spec declares cpu request=100m limit=500m, memory request=256Mi limit=512Mi
02
2. Scheduler sums all pod requests on each node; places pod on node with sufficient allocatable capacity
03
3. kubelet creates cgroup for the container: cpu.cfs_quota_us = 50000 (500m * 100ms)
04
4. Container runs; kernel tracks CPU usage against quota; throttles when quota exhausted in a period
05
5. If container memory exceeds limit: kernel OOM killer sends SIGKILL -- no SIGTERM, no grace period
06
6. kubelet reports OOMKilled; pod restarts per restartPolicy; kubectl describe shows OOMKilled
1. Pod spec declares cpu request=100m limit=500m, memory request=256Mi limit=512Mi
2. Scheduler sums all pod requests on each node; places pod on node with sufficient allocatable capacity
3. kubelet creates cgroup for the container: cpu.cfs_quota_us = 50000 (500m * 100ms)
4. Container runs; kernel tracks CPU usage against quota; throttles when quota exhausted in a period
5. If container memory exceeds limit: kernel OOM killer sends SIGKILL -- no SIGTERM, no grace period
6. kubelet reports OOMKilled; pod restarts per restartPolicy; kubectl describe shows OOMKilled
1spec:2containers:3- name: java-api4resources:5requests:6cpu: "500m" # scheduling: reserve 0.5 core7memory: "1Gi" # scheduling: reserve 1Gi8limits:CPU limit 4x request: JVM GC needs burst headroom; prevents throttle cascade9cpu: "2000m" # allows 2-core GC bursts without throttleMemory limit: set to p99 peak + 20%; OOM kill has no grace period -- be generous10memory: "2Gi" # OOM kill if exceeded; 2x request = headroom11- name: sidecar12resources:13requests:14cpu: "50m"15memory: "64Mi"16limits:17cpu: "100m"18memory: "128Mi"
Resource misconfiguration failure modes
CPU limit equal to request throttles bursty JVM workloads
resources:
requests:
cpu: "250m"
limits:
cpu: "250m" # limit == request: no burst allowed
memory: "512Mi"
# Result: JVM GC needs 1-2 core burst; throttled to 250m
# 85ms forced suspension per GC cycle; p99 latency spikesresources:
requests:
cpu: "250m" # scheduling reservation
memory: "512Mi"
limits:
# cpu limit omitted -- allow GC to burst unrestricted
# OR: cpu: "2000m" for explicit high ceiling
memory: "768Mi" # 50% headroom above request
# Monitor: container_cpu_cfs_throttled_seconds_totalFor JVM services, CPU limit == request is the worst configuration. GC bursts are throttled unconditionally. Either remove the CPU limit (monitor for runaway) or set it 4-8x higher than the request to allow burst headroom.
| QoS Class | How to achieve | Eviction order | Best for | Risk |
|---|---|---|---|---|
| Guaranteed | requests == limits (CPU + memory) | Last evicted | Databases, critical stateful services | Over-provisioning wastes cluster capacity |
| Burstable | requests < limits (or only one set) | Middle -- by usage/request ratio | APIs, web servers with traffic bursts | Evicted under node memory pressure |
| BestEffort | No requests or limits | First evicted (no warning) | Batch jobs on idle cluster capacity | Evicted instantly under any pressure |
Request vs Limit
📖 What the exam expects
Request: amount of resource guaranteed to the container (used for scheduling). Limit: maximum the container can use (enforced at runtime by cgroups).
Toggle between what certifications teach and what production actually requires
Performance debugging questions, capacity planning, and multi-tenant cluster design.
Common questions:
Strong answer: Mentions VPA for automated right-sizing, LimitRange for namespace defaults, and removing CPU limits for latency-sensitive services that need to burst without throttle.
Red flags: Setting CPU limits == requests uniformly for all services, or not knowing what BestEffort QoS means for eviction behavior.
Related concepts
Explore topics that connect to this one.
Ready to see how this works in the cloud?
Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.
View role-based pathsSign in to track your progress and mark lessons complete.
Questions? Discuss in the community or start a thread below.
Join DiscordSign in to start or join a thread.