A practical guide to choosing between virtual machines, containers, and serverless functions — covering cost, isolation, cold starts, density, and the real-world trade-offs that determine the right model for each workload.
A practical guide to choosing between virtual machines, containers, and serverless functions — covering cost, isolation, cold starts, density, and the real-world trade-offs that determine the right model for each workload.
Every workload you deploy sits on one of three compute abstractions: a virtual machine, a container, or a serverless function. Each trades off isolation, density, operational burden, and cost differently. Choosing wrong costs real money and causes real pain — either over-paying for idle capacity or hitting cold-start latency walls at 3 AM.
| Primitive | AWS | GCP | Azure | Unit of billing |
|---|---|---|---|---|
| Virtual Machine | EC2 | Compute Engine (GCE) | Azure VMs | Per-hour/second (instance running) |
| Container (managed) | ECS / EKS | GKE / Cloud Run | AKS / Container Apps | Per vCPU-second or instance-hour |
| Serverless (FaaS) | Lambda | Cloud Functions / Cloud Run | Azure Functions | Per invocation + GB-second |
The hotel analogy
VMs are like leasing a full house — you pay 24/7 whether you are home or not, but nothing is shared and you can hang pictures wherever you like. Containers are like renting an apartment — you share the building structure (kernel) but have your own space. Serverless is like booking a hotel room — you pay only for the nights you stay, the hotel manages everything, but you cannot leave heavy furniture there.
When each model wins
Every compute model has failure modes that only appear at scale or under specific traffic patterns. Understanding these ahead of time prevents expensive migrations.
| Trade-off | VMs | Containers | Serverless |
|---|---|---|---|
| Cold start latency | Minutes (AMI boot) | Seconds (image pull + start) | 100ms–3s (Lambda), 1–10s (Cloud Run) |
| Max execution time | Unlimited | Unlimited | 15 min (Lambda), 60 min (Cloud Run) |
| State persistence | Full local disk | Ephemeral (must use volumes) | None — stateless by design |
| Networking model | Full VPC control | Kubernetes networking layer | VPC optional; cold starts worse in VPC |
| Cost at low traffic | Expensive (idle billing) | Moderate (pod idle cost) | Near zero (pay per call) |
| Cost at high traffic | Predictable, flat | Linear, predictable | Can spike to 10x expected (surprise bills) |
| Operational burden | High (OS, patches, scaling) | Medium (K8s, image builds) | Low (runtime managed by cloud) |
| Observability | Full access (OS metrics) | Container-level + K8s metrics | Limited (function logs + X-Ray/Trace) |
Serverless concurrency limits are per-account, not per-function
AWS Lambda has a default concurrent execution limit of 1,000 per region per account. If one function has a traffic spike and consumes all 1,000 slots, every other Lambda in that account gets throttled. This is the "noisy neighbour" problem inside your own account. Use reserved concurrency to protect critical functions and request limit increases before you need them.
Container density math
A c5.2xlarge EC2 instance (8 vCPU, 16 GB RAM, ~$0.34/hr) can run approximately 20–30 Node.js containers each allocated 256 MB and 0.25 vCPU. The same workload as 20 individual t3.micro VMs would cost ~$0.46/hr. Containers deliver 30–40% cost savings at this scale purely through bin-packing efficiency.
1# Check Lambda concurrency usage across all functions in a regionList all functions and check their reserved concurrency settings2aws lambda list-functions --query 'Functions[].FunctionName' --output text | \3tr '\t' '\n' | \4xargs -I{} aws lambda get-function-concurrency --function-name {}56# Set reserved concurrency for a critical payment function7aws lambda put-function-concurrency \@initDuration > 0 means this was a cold start invocation8--function-name payment-processor \9--reserved-concurrent-executions 2001011# View cold start duration in CloudWatch Logs Insights12# Run in CloudWatch console:13# fields @timestamp, @duration, @initDuration14# | filter @initDuration > 015# | sort @timestamp desc16# | limit 50
The right compute model depends on four questions: How predictable is your traffic? How long does each unit of work run? Do you need persistent local state? What is your tolerance for operational complexity?
Compute model selection decision tree
Use serverless for the edges, containers for the core
A mature cloud architecture often uses all three models together: Lambda for event-driven glue (S3 trigger → resize image → store result), containers (ECS/GKE) for stateless HTTP APIs serving sustained load, and VMs for stateful services (PostgreSQL, Redis, ML inference with GPU). Each model handles the workload it is best suited for. Fighting the model — such as running a long-running job on Lambda — creates fragile hacks.
System design rounds for cloud engineer, platform engineer, and senior software engineer roles. Expect "design a scalable image processing pipeline" or "migrate this monolith to microservices — what compute would you use?" questions that require you to justify the model choice with cost and trade-off reasoning.
Common questions:
Try this question: What is the expected traffic pattern — steady, bursty, or batch? What are the latency SLOs? Is there a hard budget ceiling? Are there existing VPC dependencies that affect serverless cold-start behaviour?
Strong answer: Immediately asks about traffic pattern and execution duration before recommending a model. Mentions RDS Proxy unprompted when discussing Lambda + database architectures. Distinguishes between reserved and provisioned concurrency.
Red flags: Recommends serverless for everything because "it scales automatically" without discussing cold starts, execution limits, or cost at high throughput. Cannot explain why connection pooling is different in Lambda vs a long-running service.
Key takeaways
💡 Analogy
Think of compute models as accommodation options. A VM is a full house you lease: you control every room, but you pay the mortgage whether or not you are home. A container is an apartment: you share the building infrastructure with neighbours (the host kernel) but have your own front door and keys. Serverless is a hotel room: you pay only for the nights you stay, housekeeping handles everything, but you cannot install a new bathtub and your luggage must fit in a carry-on (stateless, short-lived).
⚡ Core Idea
Each compute primitive trades off control and cost differently: VMs give full control at high idle cost; containers give good density at medium operational overhead; serverless gives zero idle cost at the expense of execution time limits, cold starts, and reduced observability.
🎯 Why It Matters
Choosing the wrong compute model is one of the most expensive architectural mistakes a cloud team can make. Going all-in on serverless for a latency-sensitive API causes cold-start incidents. Going all-in on VMs for a bursty ETL pipeline means paying for 100 instances 24/7 when you only need them for 30 minutes at midnight. Getting this decision right at design time avoids painful — and costly — migrations later.
Ready to see how this works in the cloud?
Switch to Career Paths for structured paths (e.g. Developer, DevOps) and provider-specific lessons.
View role-based pathsSign in to track your progress and mark lessons complete.
Questions? Discuss in the community or start a thread below.
Join DiscordSign in to start or join a thread.