Serverless Containers: Fargate & Cloud Run

On this page

You have a container. Now where does it run?
The one-sentence definition
The picture: image to autoscaled instances
Functions vs serverless containers vs managed Kubernetes
Deploy it: Cloud Run and a Fargate task definition
Cold starts and concurrency, the two things that bite
Common mistakes that cost hours
Takeaways
Where to go next

TL;DR

You will know when to reach for serverless containers over Lambda-style functions or full Kubernetes, and how to deploy on Cloud Run or Fargate while sidestepping the cold-start and concurrency traps that quietly hurt you.

You have a container. Now where does it run?

You built a Docker image. It runs perfectly on your laptop. Now you need it on the internet, and suddenly you are staring at a fork in the road. Option A: rent a VM, SSH in, install Docker, run the container, then wake up at 3am when the disk fills or the instance dies. Option B: stand up a Kubernetes cluster, learn Deployments, Services, Ingress, and pay for control-plane nodes that sit idle at night. Both feel like enormous overhead just to run one image.

There is a third option that most people skip past: serverless containers. You hand the platform your image, it runs it, scales it from zero to thousands of copies, bills you only while requests are flowing, and you never touch a server or a node. That is what AWS Fargate and Google Cloud Run do.

Who this is for

Developers who can build a Docker image and want it live without learning cluster operations. If you have ever thought *I just want to run this container and not think about it*, this is for you. Coming from functions? You will see why a container with cold starts and concurrency might fit better. Coming from Kubernetes? You will see what you can stop paying for.

The one-sentence definition

A serverless container platform runs your container image on demand, scales the number of running copies to match traffic, including all the way down to zero, and charges you by the time and resources you actually use.
The whole article in one line

The word that trips people up is *serverless*. There are still servers, you just never see, choose, patch, or pay for the idle ones. The platform owns the fleet; you rent slices of it per request.

You request a ride and one shows up; you do not own the carYou send a request and the platform starts a container; you do not own the server

At 3am with no riders, no cars are circling burning fuelScale-to-zero: with no traffic, nothing runs and you pay nothing

Friday rush: dozens of cars dispatch automaticallyAutoscaling: traffic spikes spin up many identical container copies

You pay per trip, not a monthly car paymentPer-request / per-second billing instead of a fixed instance bill

First car after a quiet night takes a moment to arriveCold start: the first request after idle waits for a container to boot

Serverless containers map cleanly onto how a good ride-hailing service works.

The picture: image to autoscaled instances

The mental model is short. Your image lives in a registry. You point the platform at it. The platform pulls it, runs it, and replicates it up and down based on incoming requests, bottoming out at zero copies when nobody is calling.

A request arrives, the platform routes it to a running copy of your image, spinning up more under load and scaling all the way to zero when idle.

1
Build and push your image
Package your app into a Docker image and push it to a registry the platform can read, Amazon ECR for Fargate, Artifact Registry for Cloud Run.
2
Point the platform at the image
Create a Cloud Run service or a Fargate task definition that references the image URI plus CPU/memory and the port your app listens on.
3
The platform pulls and starts a copy
On the first request it pulls the image and boots one container instance, that first boot is the cold start.
4
Traffic scales the instance count
More concurrent requests than one copy can handle? The platform launches more identical copies. Traffic drops? It tears them down.
5
Idle scales to zero
With Cloud Run (and Fargate behind the right autoscaler), sustained no-traffic means zero running copies and a near-zero bill.

Functions vs serverless containers vs managed Kubernetes

These three sit on a spectrum from *least to most control*, and the right choice is rarely the most powerful one. Functions give you the least to manage; managed Kubernetes gives you the most knobs. Serverless containers sit deliberately in the middle.

Dimension	Functions (Lambda-style)	Serverless containers	Managed Kubernetes
Scaling	Automatic, per-invocation	Automatic, scale-to-zero by request	You configure HPA + node pools
Cold start	Milliseconds to ~1s	~1–10s (image + app boot)	None (pods stay warm), you pay for warm
Control	Lowest, runtime + limits fixed	Medium, any image, your own runtime	Highest, full cluster, networking, CRDs
Cost when idle	Zero	Zero (scales to zero)	Pay for nodes + control plane 24/7
Best for	Short event-driven tasks	HTTP apps, APIs, batch jobs	Many services, complex networking, scale

Pick the row that matches the trade-off you actually want.

The honest default

If your workload is an HTTP service or a containerized job and you do not yet have a fleet of services that need to talk to each other over a service mesh, start with serverless containers. You can graduate to managed Kubernetes later, your image does not change.

Deploy it: Cloud Run and a Fargate task definition

Cloud Run is the fastest way to feel this. One command takes a local source directory or an image straight to a public HTTPS URL.

deploy-cloud-run.sh

bash

# Deploy an existing image to Cloud Run (fully managed)
gcloud run deploy my-api \
  --image=us-docker.pkg.dev/my-project/apps/my-api:1.0 \
  --region=europe-west4 \
  --platform=managed \
  --allow-unauthenticated \
  --port=8080 \
  --cpu=1 --memory=512Mi \
  --min-instances=0 \        # scale to zero when idle
  --max-instances=50 \       # cap the autoscaler
  --concurrency=80            # requests served per instance

# Output ends with the public URL:
# Service [my-api] revision [my-api-00001] has been deployed
# and is serving traffic at https://my-api-xxxx.a.run.app

On AWS, the unit is a task definition, a JSON spec describing the container, that you run on Fargate. The platform provisions the compute; you never pick an instance type.

task-definition.yaml

yaml

# ECS task definition for Fargate (register, then run as a service)
family: my-api
requiresCompatibilities:
  - FARGATE                  # serverless launch type, no EC2 to manage
networkMode: awsvpc
cpu: "256"                   # 0.25 vCPU
memory: "512"               # 512 MB
executionRoleArn: arn:aws:iam::111122223333:role/ecsTaskExecutionRole
containerDefinitions:
  - name: my-api
    image: 111122223333.dkr.ecr.eu-west-1.amazonaws.com/my-api:1.0
    essential: true
    portMappings:
      - containerPort: 8080
        protocol: tcp
    logConfiguration:
      logDriver: awslogs
      options:
        awslogs-group: /ecs/my-api
        awslogs-region: eu-west-1
        awslogs-stream-prefix: ecs

run-fargate.sh

bash

# Register the task definition, then run it on Fargate
aws ecs register-task-definition --cli-input-yaml file://task-definition.yaml

aws ecs create-service \
  --cluster my-cluster \
  --service-name my-api \
  --task-definition my-api \
  --launch-type FARGATE \
  --desired-count 1 \
  --network-configuration "awsvpcConfiguration={subnets=[subnet-abc],securityGroups=[sg-123],assignPublicIp=ENABLED}"

# Confirm it is running:
aws ecs describe-services --cluster my-cluster --services my-api \
  --query "services[0].runningCount"

Your app must listen on the right port

The single most common first-deploy failure: the platform sends traffic to a port your app is not listening on. Cloud Run injects a PORT env var (default 8080), bind to it. Fargate routes to the containerPort you declared. Mismatch = health checks fail and the container loops forever.

Cold starts and concurrency, the two things that bite

Scale-to-zero is wonderful for your bill and brutal for your tail latency, because the saving and the cost are the same mechanism. When there are zero copies running, the next request cannot be served until a copy exists, and creating one takes time.

What a cold start actually is

A cold start is the delay between a request arriving and a brand-new container instance being ready to serve it. It is the sum of: pulling the image (if not cached), starting the container, and your app's own boot time (loading frameworks, opening DB pools, JIT warmup). A 600 MB image with a slow-booting framework can cold-start in 5–10 seconds; a lean image with a fast runtime can do it in under a second.

Shrink the image. Smaller layers pull faster. Use slim or distroless base images and multi-stage builds.
Speed up app boot. Lazy-load what you can; defer non-critical connections until after the server is listening.
Keep one copy warm. Set --min-instances=1 (Cloud Run) or desiredCount: 1 (Fargate) so there is always a ready instance, you trade a small always-on cost for zero cold starts.

Concurrency: how many requests one copy handles

This is the lever functions do not give you. Concurrency is the number of simultaneous requests a single container instance serves before the platform starts another copy. Functions are effectively concurrency-1: one invocation per instance. Serverless containers let one instance handle many requests at once, Cloud Run defaults to 80.

Higher concurrency means fewer instances for the same traffic, which means lower cost and fewer cold starts, but only if your app is genuinely concurrent (async I/O, a real thread pool). If each request pins a CPU core, high concurrency just makes every request slow. Match concurrency to what your code can actually overlap.

A quick way to reason about it

Instances needed ≈ peak concurrent requests ÷ concurrency-per-instance. Serving 800 concurrent requests at concurrency 80 needs ~10 instances. Drop concurrency to 1 (function-style) and you need ~800. That ratio is the whole cost-and-cold-start story.

Common mistakes that cost hours

1Binding to localhost instead of 0.0.0.0. Inside a container, 127.0.0.1 is unreachable from the platform's router. Bind to 0.0.0.0:$PORT or every request times out.
2Assuming local state survives. Instances are ephemeral and can vanish between requests. Anything you write to local disk or in-memory is gone after scale-down. Use object storage or a database.
3Setting concurrency to 1 out of habit. Carrying over the function mindset multiplies your instance count and your cold starts. Start at the default and tune down only if your code is CPU-bound.
4Forgetting min-instances on a latency-sensitive path. A login or checkout endpoint behind scale-to-zero will hand a multi-second cold start to a real user. Keep one warm.
5Ignoring startup CPU and timeouts. A heavy boot can exceed the platform's startup probe window; the instance is killed mid-boot and retried forever. Profile your cold start before you ship.
6Leaving `--allow-unauthenticated` on a private API. Convenient for a demo, an open door in production. Lock it down with IAM or an identity-aware proxy.

Takeaways

The whole article in seven lines

Serverless containers run your image with no servers, no cluster, and no idle nodes to pay for.
Scale-to-zero means zero cost when idle, at the price of a cold start on the next request.
Cold start = image pull + container start + your app's boot time. Shrink the image and speed up boot.
Concurrency lets one instance serve many requests, the lever functions never gave you. Tune it to your workload.
Use Cloud Run on GCP and Fargate on AWS; you describe an image and a port, not a machine.
They beat functions when you need a real container, longer runtimes, or many requests per instance.
They beat Kubernetes until you have a fleet of services that need cluster-level networking and control.

Where to go next

You now have a container running on the internet without owning a server. To place this in the bigger compute picture and to know when to graduate to a cluster, follow these next:

Zoom out: Compute Models: VMs vs Containers vs Serverless, where serverless containers sit among all your options.
Graduate up: Managed Kubernetes: EKS, AKS & GKE, for when one service becomes many.
Get hands-on with images first: the Docker lab, build the image you will deploy here.
Go end-to-end: the Cloud Engineer career path, where containers, networking, and IaC come together.

You have a Docker image. Where should you run it?

Check your understanding

1. What does the article say "serverless" actually means for Fargate and Cloud Run?

2. Where do functions, serverless containers, and managed Kubernetes sit relative to each other?

Frequently asked questions

What is a serverless container platform?

It runs your container image on demand, scales the number of running copies to match traffic including all the way down to zero, and charges you only for the time and resources you actually use. AWS Fargate and Google Cloud Run are the two main examples.

If it's serverless, are there really no servers?

There are still servers, you just never see, choose, patch, or pay for the idle ones. The platform owns the fleet and you rent slices of it per request.

How do serverless containers compare to functions and managed Kubernetes?

The three sit on a spectrum from least to most control: functions give you the least to manage, managed Kubernetes gives you the most knobs, and serverless containers sit deliberately in the middle. The right choice is rarely the most powerful one.

What are the two things that bite most often with serverless containers?

Cold starts and concurrency. Cold starts happen when the platform spins up a copy from zero, and concurrency governs how many requests each instance handles, so both shape latency and cost and deserve tuning.

Was this article helpful?

Want to go deeper?

This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.

Explore Career Paths Try the Labs

Keep reading

Cloud

Cloud Networking Fundamentals: How a VPC Actually Works

Read

Cloud

How the Cloud Actually Works: Regions, AZs & the Edge

Read

Cloud

IaaS vs PaaS vs SaaS, What You Actually Manage

Read

Serverless Containers: Fargate & Cloud Run

01You have a container. Now where does it run?

02The one-sentence definition

03The picture: image to autoscaled instances

04Functions vs serverless containers vs managed Kubernetes

05Deploy it: Cloud Run and a Fargate task definition

06Cold starts and concurrency, the two things that bite

What a cold start actually is

Concurrency: how many requests one copy handles

07Common mistakes that cost hours

08Takeaways

09Where to go next

Frequently asked questions

Want to go deeper?

Cloud Networking Fundamentals: How a VPC Actually Works

How the Cloud Actually Works: Regions, AZs & the Edge

IaaS vs PaaS vs SaaS, What You Actually Manage

You have a container. Now where does it run?

The one-sentence definition

The picture: image to autoscaled instances

Functions vs serverless containers vs managed Kubernetes

Deploy it: Cloud Run and a Fargate task definition

Cold starts and concurrency, the two things that bite

Common mistakes that cost hours

Takeaways

Where to go next