Run your container without provisioning a single server or babysitting a cluster. Here is how Fargate and Cloud Run give you scale-to-zero, per-request billing, and autoscaling, and exactly when they beat both Lambda-style functions and full Kubernetes.
You built a Docker image. It runs perfectly on your laptop. Now you need it on the internet, and suddenly you are staring at a fork in the road. Option A: rent a VM, SSH in, install Docker, run the container, then wake up at 3am when the disk fills or the instance dies. Option B: stand up a Kubernetes cluster, learn Deployments, Services, Ingress, and pay for control-plane nodes that sit idle at night. Both feel like enormous overhead just to run one image.
There is a third option that most people skip past: serverless containers. You hand the platform your image, it runs it, scales it from zero to thousands of copies, bills you only while requests are flowing, and you never touch a server or a node. That is what AWS Fargate and Google Cloud Run do.
Who this is for
Developers who can build a Docker image and want it live without learning cluster operations. If you have ever thought *I just want to run this container and not think about it*, this is for you. Coming from functions? You will see why a container with cold starts and concurrency might fit better. Coming from Kubernetes? You will see what you can stop paying for.
The one-sentence definition
A serverless container platform runs your container image on demand, scales the number of running copies to match traffic, including all the way down to zero, and charges you by the time and resources you actually use.
The word that trips people up is *serverless*. There are still servers, you just never see, choose, patch, or pay for the idle ones. The platform owns the fleet; you rent slices of it per request.
You request a ride and one shows up; you do not own the carYou send a request and the platform starts a container; you do not own the server
At 3am with no riders, no cars are circling burning fuelScale-to-zero: with no traffic, nothing runs and you pay nothing
Friday rush: dozens of cars dispatch automaticallyAutoscaling: traffic spikes spin up many identical container copies
You pay per trip, not a monthly car paymentPer-request / per-second billing instead of a fixed instance bill
First car after a quiet night takes a moment to arriveCold start: the first request after idle waits for a container to boot
Serverless containers map cleanly onto how a good ride-hailing service works.
The picture: image to autoscaled instances
The mental model is short. Your image lives in a registry. You point the platform at it. The platform pulls it, runs it, and replicates it up and down based on incoming requests, bottoming out at zero copies when nobody is calling.
A request arrives, the platform routes it to a running copy of your image, spinning up more under load and scaling all the way to zero when idle.
1
Build and push your image
Package your app into a Docker image and push it to a registry the platform can read, Amazon ECR for Fargate, Artifact Registry for Cloud Run.
2
Point the platform at the image
Create a Cloud Run service or a Fargate task definition that references the image URI plus CPU/memory and the port your app listens on.
3
The platform pulls and starts a copy
On the first request it pulls the image and boots one container instance, that first boot is the cold start.
4
Traffic scales the instance count
More concurrent requests than one copy can handle? The platform launches more identical copies. Traffic drops? It tears them down.
5
Idle scales to zero
With Cloud Run (and Fargate behind the right autoscaler), sustained no-traffic means zero running copies and a near-zero bill.
Functions vs serverless containers vs managed Kubernetes
These three sit on a spectrum from *least to most control*, and the right choice is rarely the most powerful one. Functions give you the least to manage; managed Kubernetes gives you the most knobs. Serverless containers sit deliberately in the middle.
Dimension
Functions (Lambda-style)
Serverless containers
Managed Kubernetes
Scaling
Automatic, per-invocation
Automatic, scale-to-zero by request
You configure HPA + node pools
Cold start
Milliseconds to ~1s
~1–10s (image + app boot)
None (pods stay warm), you pay for warm
Control
Lowest, runtime + limits fixed
Medium, any image, your own runtime
Highest, full cluster, networking, CRDs
Cost when idle
Zero
Zero (scales to zero)
Pay for nodes + control plane 24/7
Best for
Short event-driven tasks
HTTP apps, APIs, batch jobs
Many services, complex networking, scale
Pick the row that matches the trade-off you actually want.
The honest default
If your workload is an HTTP service or a containerized job and you do not yet have a fleet of services that need to talk to each other over a service mesh, start with serverless containers. You can graduate to [managed Kubernetes](/blog/managed-kubernetes-eks-aks-gke) later, your image does not change.
Deploy it: Cloud Run and a Fargate task definition
Cloud Run is the fastest way to feel this. One command takes a local source directory or an image straight to a public HTTPS URL.
deploy-cloud-run.sh
bash
# Deploy an existing image to Cloud Run (fully managed)
gcloud run deploy my-api \
--image=us-docker.pkg.dev/my-project/apps/my-api:1.0 \
--region=europe-west4 \
--platform=managed \
--allow-unauthenticated \
--port=8080 \
--cpu=1 --memory=512Mi \
--min-instances=0 \ # scale to zero when idle
--max-instances=50 \ # cap the autoscaler
--concurrency=80# requests served per instance# Output ends with the public URL:# Service [my-api] revision [my-api-00001] has been deployed# and is serving traffic at https://my-api-xxxx.a.run.app
On AWS, the unit is a task definition, a JSON spec describing the container, that you run on Fargate. The platform provisions the compute; you never pick an instance type.
task-definition.yaml
yaml
# ECS task definition for Fargate (register, then run as a service)family: my-api
requiresCompatibilities:
- FARGATE # serverless launch type, no EC2 to managenetworkMode: awsvpc
cpu: "256"# 0.25 vCPUmemory: "512"# 512 MBexecutionRoleArn: arn:aws:iam::111122223333:role/ecsTaskExecutionRole
containerDefinitions:
- name: my-api
image: 111122223333.dkr.ecr.eu-west-1.amazonaws.com/my-api:1.0essential: trueportMappings:
- containerPort: 8080protocol: tcp
logConfiguration:
logDriver: awslogs
options:
awslogs-group: /ecs/my-api
awslogs-region: eu-west-1awslogs-stream-prefix: ecs
run-fargate.sh
bash
# Register the task definition, then run it on Fargate
aws ecs register-task-definition --cli-input-yaml file://task-definition.yaml
aws ecs create-service \
--cluster my-cluster \
--service-name my-api \
--task-definition my-api \
--launch-type FARGATE \
--desired-count 1 \
--network-configuration "awsvpcConfiguration={subnets=[subnet-abc],securityGroups=[sg-123],assignPublicIp=ENABLED}"# Confirm it is running:
aws ecs describe-services --cluster my-cluster --services my-api \
--query "services[0].runningCount"
Your app must listen on the right port
The single most common first-deploy failure: the platform sends traffic to a port your app is not listening on. Cloud Run injects a **PORT** env var (default 8080), bind to it. Fargate routes to the **containerPort** you declared. Mismatch = health checks fail and the container loops forever.
Cold starts and concurrency, the two things that bite
Scale-to-zero is wonderful for your bill and brutal for your tail latency, because the saving and the cost are the same mechanism. When there are zero copies running, the next request cannot be served until a copy exists, and creating one takes time.
What a cold start actually is
A cold start is the delay between a request arriving and a brand-new container instance being ready to serve it. It is the sum of: pulling the image (if not cached), starting the container, and your app's own boot time (loading frameworks, opening DB pools, JIT warmup). A 600 MB image with a slow-booting framework can cold-start in 5–10 seconds; a lean image with a fast runtime can do it in under a second.
Shrink the image. Smaller layers pull faster. Use slim or distroless base images and multi-stage builds.
Speed up app boot. Lazy-load what you can; defer non-critical connections until after the server is listening.
Keep one copy warm. Set --min-instances=1 (Cloud Run) or desiredCount: 1 (Fargate) so there is always a ready instance, you trade a small always-on cost for zero cold starts.
Concurrency: how many requests one copy handles
This is the lever functions do not give you. Concurrency is the number of simultaneous requests a single container instance serves before the platform starts another copy. Functions are effectively concurrency-1: one invocation per instance. Serverless containers let one instance handle many requests at once, Cloud Run defaults to 80.
Higher concurrency means fewer instances for the same traffic, which means lower cost and fewer cold starts, but only if your app is genuinely concurrent (async I/O, a real thread pool). If each request pins a CPU core, high concurrency just makes every request slow. Match concurrency to what your code can actually overlap.
A quick way to reason about it
Instances needed ≈ peak concurrent requests ÷ concurrency-per-instance. Serving 800 concurrent requests at concurrency 80 needs ~10 instances. Drop concurrency to 1 (function-style) and you need ~800. That ratio is the whole cost-and-cold-start story.
Common mistakes that cost hours
Binding to localhost instead of 0.0.0.0. Inside a container, 127.0.0.1 is unreachable from the platform's router. Bind to 0.0.0.0:$PORT or every request times out.
Assuming local state survives. Instances are ephemeral and can vanish between requests. Anything you write to local disk or in-memory is gone after scale-down. Use object storage or a database.
Setting concurrency to 1 out of habit. Carrying over the function mindset multiplies your instance count and your cold starts. Start at the default and tune down only if your code is CPU-bound.
Forgetting min-instances on a latency-sensitive path. A login or checkout endpoint behind scale-to-zero will hand a multi-second cold start to a real user. Keep one warm.
Ignoring startup CPU and timeouts. A heavy boot can exceed the platform's startup probe window; the instance is killed mid-boot and retried forever. Profile your cold start before you ship.
Leaving `--allow-unauthenticated` on a private API. Convenient for a demo, an open door in production. Lock it down with IAM or an identity-aware proxy.
Takeaways
The whole article in seven lines
Serverless containers run your image with no servers, no cluster, and no idle nodes to pay for.
Scale-to-zero means zero cost when idle, at the price of a cold start on the next request.
Cold start = image pull + container start + your app's boot time. Shrink the image and speed up boot.
Concurrency lets one instance serve many requests, the lever functions never gave you. Tune it to your workload.
Use **Cloud Run** on GCP and **Fargate** on AWS; you describe an image and a port, not a machine.
They beat functions when you need a real container, longer runtimes, or many requests per instance.
They beat Kubernetes until you have a fleet of services that need cluster-level networking and control.
Where to go next
You now have a container running on the internet without owning a server. To place this in the bigger compute picture and to know when to graduate to a cluster, follow these next:
This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.