Autoscaling adjusts replica counts and node counts based on demand. HPA (Horizontal Pod Autoscaler) scales Pods, VPA (Vertical Pod Autoscaler) adjusts resource requests, cluster autoscaler adds nodes.
Autoscaling adjusts replica counts and node counts based on demand. HPA (Horizontal Pod Autoscaler) scales Pods, VPA (Vertical Pod Autoscaler) adjusts resource requests, cluster autoscaler adds nodes.
Lesson outline
HPA watches a metric (CPU, memory, custom) and scales the number of replicas up or down.
Example: "if avg CPU > 70%, increase replicas to 10. If avg CPU < 30%, decrease replicas to 2."
HPA makes decisions every 15-30 seconds. Requires Metrics Server to be installed for CPU/memory metrics.
Custom metrics (from Prometheus) enable scaling on application-specific signals (requests per second, queue depth).
VPA adjusts CPU/memory resource requests (not limits) based on historical usage.
Useful if you don't know what resources your app needs. VPA recommends values, or can auto-apply them (requires Pod restart).
VPA is less commonly used than HPA; most teams prefer manual tuning or HPA with appropriate requests.
Cluster autoscaler watches for unschedulable Pods (stuck in Pending). If a Pod can't fit on any node, it adds nodes.
Works with cloud providers (AWS, Azure, GCP) to scale node groups up or down.
Paired with HPA: HPA scales Pods, cluster autoscaler scales nodes to accommodate.
Request: minimum resources needed. Used by scheduler to fit Pods on nodes. Affects HPA CPU metrics.
Limit: maximum resources allowed. If a Pod exceeds the limit, it is throttled (CPU) or killed (memory).
Proper requests are critical for HPA and cluster autoscaler to work correctly.
Key takeaways
💡 Analogy
HPA is like a restaurant that hires waiters based on table occupancy (Pods = waiters, load = tables). VPA is like a tailor adjusting uniform sizes based on employee measurements. Cluster autoscaler is like the restaurant opening new branches when all restaurants are full.
⚡ Core Idea
HPA scales replicas based on metrics. VPA adjusts resource requests. Cluster autoscaler adds nodes. All three work together to auto-scale capacity.
🎯 Why It Matters
Autoscaling reduces costs (scale down when traffic drops) and improves availability (scale up when traffic spikes). Proper resource requests are the foundation.
Related concepts
Explore topics that connect to this one.
Ready to see how this works in the cloud?
Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.
View role-based pathsSign in to track your progress and mark lessons complete.
Questions? Discuss in the community or start a thread below.
Join DiscordSign in to start or join a thread.