Async Processing: Queues & Workers

On this page

The checkout that times out
The principle: acknowledge fast, do the slow work later
The picture: API to queue to worker pool
Sync vs async: what you are actually trading
Enqueue a job, write an idempotent worker
Idempotency, retries, and the dead-letter queue
Common mistakes that cost hours
Takeaways
Where to go next

TL;DR

Move slow work off the request path with queues and workers so APIs stay fast. The catch: at-least-once delivery means you must make workers idempotent and add retries plus a dead-letter queue, or duplicates and lost jobs will bite.

The checkout that times out

A customer clicks Place order. Your handler charges the card, writes the order row, then, still inside the same HTTP request, renders a receipt, calls the email provider, pings the warehouse webhook, and updates the analytics warehouse. The card charge took 200ms. Everything after it took 4 seconds. The mobile client gave up at 3, showed a spinning error, and the customer tapped Place order again.

Now you have two charges, one angry customer, and a support ticket. Nothing was broken, the code did exactly what you wrote. The mistake was *where* you ran the slow work: inline, in the request path, holding the user hostage while you talked to three external systems that have nothing to do with whether the order succeeded.

The order was *done* the moment the charge cleared and the row was written. The email, the webhook, the analytics, those are follow-up work. They should not be able to slow down, or fail, the thing the user is actually waiting for. This is the problem async processing solves.

Who this is for

Backend engineers who can build a CRUD API and now need to make it fast and resilient under real load. You should be comfortable with HTTP request/response and a database. You do not need prior queue experience, we build the mental model from zero. If you have shipped a feature that "sometimes sends two emails," this article is for you.

The principle: acknowledge fast, do the slow work later

Do the minimum work needed to keep your promise to the user, acknowledge immediately, and push everything else onto a queue for a worker to finish in the background.
The async processing rule

The request path should only contain work that is required for correctness and that the user is waiting on the result of. Charging the card qualifies. Sending a confirmation email does not, the user does not stare at the screen until the email lands. So the handler does the essential bit, drops a message describing the rest onto a queue, and returns 202 Accepted in milliseconds. A separate process picks the message up later and does the slow part.

This splits one fragile synchronous chain into two independent halves. The producer (your API) only needs the queue to be up. The consumer (your worker) can be slow, can restart, can fail and retry, and the user never sees any of it. You have traded a tiny bit of immediacy (the email arrives in 2 seconds, not 0) for a large amount of latency and reliability.

Waiter clips the order ticket to the rail and walks awayProducer enqueues a job and returns immediately

The ticket rail holding orders in lineThe message queue buffering jobs

Cooks pull the next ticket when they are freeWorker pool pulling jobs at its own pace

A rush doesn't stop waiters taking orders, tickets just pile upLoad spike doesn't slow the API; the queue absorbs the backlog

A dropped dish gets re-fired from the same ticketA failed job is retried from the same message

A busy restaurant kitchen is a producer/consumer system you have already seen work.

The picture: API to queue to worker pool

The API enqueues a job and returns instantly. A pool of workers pulls jobs from the queue and processes them. Jobs that keep failing branch off to a dead-letter queue for humans to inspect.

1
Client calls the API
POST /orders. The user is now waiting on this single request, so it must be fast.
2
API does the essential work
Charge the card, write the order row in the database. This is the part the user actually cares about.
3
API enqueues a job
It writes a small message, { type: "send_receipt", orderId: 8842 }, onto the queue. This is a sub-millisecond local call.
4
API returns 202 immediately
The user sees a confirmation in under 300ms. The slow follow-up work hasn't happened yet, and that's fine.
5
A worker pulls the job
Whenever a worker in the pool is free, it grabs the next message and processes it, sends the email, calls the webhook.
6
Worker acknowledges success
On success it tells the queue to delete the message. On repeated failure, the message is routed to the dead-letter queue for a human.

Sync vs async: what you are actually trading

Async is not free, and it is not always right. A read that returns data the user needs *now* must stay synchronous. The trade is concrete: you spend complexity to buy latency and reliability. Know what you are buying.

Dimension	Synchronous (inline)	Asynchronous (queue + worker)
User-perceived latency	Sum of every step, as slow as the slowest dependency	Only the essential work; follow-up runs later
Reliability	One downstream failure fails the whole request	Downstream failure is isolated, retried, and invisible to the user
Load spikes	Each request holds resources until done; pile-ups cascade	Queue absorbs the burst; workers drain it at a steady rate
Complexity	Low, one process, easy to reason about	Higher, a broker, workers, retries, idempotency, monitoring
Failure visibility	Immediate, user sees the error	Deferred, you need dashboards and a DLQ to see failures
Best for	Reads the user is waiting on; work required for the response	Emails, webhooks, thumbnails, exports, anything fire-and-forget

Doing slow work inline vs. handing it to a queue and worker.

Enqueue a job, write an idempotent worker

Here is the producer side. The API charges the card, writes the order, then enqueues a tiny message. Notice the message carries an idempotency key, we will use it to make the worker safe to run more than once.

api/orders.py

python

import uuid

def place_order(request):
    # 1. Essential work the user is waiting on.
    charge = payments.charge(request.card, request.amount)
    order = db.orders.insert(
        user_id=request.user_id,
        amount=request.amount,
        charge_id=charge.id,
    )

    # 2. Enqueue the slow follow-up work. Sub-millisecond local call.
    queue.send({
        "type": "send_receipt",
        "order_id": order.id,
        # A stable key so retries don't duplicate side effects.
        "idempotency_key": f"receipt:{order.id}",
    })

    # 3. Return immediately. The email hasn't been sent yet, that's fine.
    return Response(status=202, body={"order_id": order.id})

And the consumer. The queue guarantees the message is delivered at least once, which means it may be delivered *twice*. The worker must produce the same result whether it runs once or five times. We enforce that by recording the idempotency key before doing the side effect, inside a unique constraint.

worker/handlers.py

python

def handle_send_receipt(msg):
    key = msg["idempotency_key"]

    # Claim the work atomically. If this key was already processed,
    # the unique index raises and we skip the side effect entirely.
    try:
        db.processed_jobs.insert(key=key)  # UNIQUE(key)
    except UniqueViolation:
        return ack(msg)  # already done, safe to drop the duplicate

    # Only now do the non-repeatable side effect.
    order = db.orders.get(msg["order_id"])
    email.send_receipt(order.user_email, order)

    return ack(msg)  # tell the queue to delete the message

Ack means "done", not "received"

A worker should acknowledge a message only after the work has fully succeeded. If the worker crashes before acking, the queue redelivers the message to another worker, that is exactly how at-least-once delivery recovers from crashes. Ack too early and a crash silently loses the job.

Idempotency, retries, and the dead-letter queue

At-least-once delivery is why idempotency is mandatory

Almost every production queue (SQS, RabbitMQ, Kafka) delivers at least once, not exactly once. "Exactly once" across a network and a crash is famously close to impossible. So you must assume any message can arrive more than once: a worker processed it, sent the email, then crashed before acking, the queue redelivers, and a second worker sends the email *again*. Idempotency is the property that processing the same message twice has the same effect as processing it once. The unique-key check above is how we get it: the first run claims the key, every duplicate hits the constraint and short-circuits.

Retries with backoff

When a job fails, the email provider is down, the network blipped, you do not drop it, you retry. But retry *with exponential backoff and jitter*: wait 1s, then 2s, then 4s, then 8s, with a little randomness so a thousand failed jobs don't all retry in lockstep and hammer the recovering dependency. Crucially, retries must be bounded. An infinite retry loop on a permanently-bad message (a "poison message") will spin forever, burn CPU, and block the queue.

The dead-letter queue catches what can't succeed

After N failed attempts (5 is a common default), the broker moves the message to a dead-letter queue, a separate queue for messages that could not be processed. The DLQ is your safety net: nothing is silently lost, the bad message stops blocking the main queue, and a human (or an alert) can inspect it, fix the root cause, and replay it. A queue without a DLQ either loses messages or jams forever. Always wire one up, and always alert on DLQ depth greater than zero.

Common mistakes that cost hours

1Non-idempotent handlers. Assuming each message arrives exactly once. Under at-least-once delivery, a duplicate means a double charge or a double email. Make every handler safe to re-run before you ship it.
2No dead-letter queue. Without a DLQ, a poison message either gets lost on the last failed attempt or retries forever and blocks everything behind it. Configure a DLQ and alert when it is non-empty.
3Infinite retries. Retrying a permanently-bad message forever wastes resources and starves healthy jobs. Bound retries, use exponential backoff with jitter, then dead-letter.
4Assuming message ordering. Most queues do not guarantee order across messages, and a worker pool processes them in parallel anyway. If "updated" can be processed before "created", your data corrupts. Don't depend on order, or use an ordered/FIFO queue with a partition key and accept the throughput cost.
5Acking before the work is done. Acknowledging on receipt instead of on success turns every worker crash into a silently dropped job.
6Fat messages. Putting a 5MB payload on the queue instead of an ID the worker re-fetches. Keep messages small; pass references, not data.

Takeaways

The whole article in seven lines

Keep the request path to work the user is waiting on; acknowledge fast and push the rest onto a queue.
Producer enqueues and returns 202; a worker pool consumes and does the slow side effects later.
Queues deliver at-least-once, so every message can arrive twice. Idempotency is mandatory, not optional.
Make handlers idempotent by claiming a stable idempotency key under a unique constraint before the side effect.
Ack only after success; a crash before ack triggers redelivery, which is how the system self-heals.
Retry with bounded exponential backoff and jitter; route exhausted messages to a dead-letter queue and alert on it.
Never assume ordering across messages unless you explicitly use a FIFO queue with a partition key.

Where to go next

Async processing is one pillar of building backends that stay fast and reliable under load. Pair it with the broader scaling patterns and the data-consistency rules that decide what the producer must do *before* it enqueues.

Scalability Principles, where queues fit among caching, load balancing, and horizontal scaling.
Database Transactions & Consistency, why the charge and the order row must commit together before you enqueue follow-up work.
Follow the full Backend Engineer path to see how async processing connects to the rest of the curriculum.

You are deciding whether a piece of work should run inline in the request or be pushed to a queue. What is it?

Check your understanding

1. According to the article, what work belongs in the synchronous request path?

2. What does splitting the work into a producer and a consumer trade away, and what does it buy?

Frequently asked questions

What work should stay in the request path and what should move to a queue?

Keep only the work that is required for correctness and that the user is actively waiting on the result of, like charging the card and writing the order row. Everything else, such as sending a confirmation email, calling a webhook, or updating analytics, is follow-up work that belongs on a queue for a background worker.

Why does async processing require idempotency?

Queues typically use at-least-once delivery, so the same job can be delivered more than once, and workers can fail and retry. An idempotent worker produces the same result no matter how many times it processes the same message, which prevents duplicate side effects like sending two emails for one order.

What does a dead-letter queue do?

A dead-letter queue holds messages that keep failing after their retries are exhausted, so a poisoned or broken job does not block the rest of the queue or get lost silently. It gives you a place to inspect and reprocess those failures rather than retrying forever.

When should work stay synchronous instead of going async?

A read that returns data the user needs right now must stay synchronous, since the user is staring at the screen waiting for it. Async buys latency and reliability at the cost of added complexity, so only move work off the request path when the user is not waiting on its result.

Was this article helpful?

Want to go deeper?

This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.

Explore Career Paths Try the Labs

Keep reading

Backend

What Is a Backend Engineer?

Read

Backend

How the Web Works: HTTP Requests

Read

Backend

REST API Design: Clean, Predictable HTTP APIs

Read

Async Processing: Queues & Workers

01The checkout that times out

02The principle: acknowledge fast, do the slow work later

03The picture: API to queue to worker pool

04Sync vs async: what you are actually trading

05Enqueue a job, write an idempotent worker

06Idempotency, retries, and the dead-letter queue

At-least-once delivery is why idempotency is mandatory

Retries with backoff

The dead-letter queue catches what can't succeed

07Common mistakes that cost hours

08Takeaways

09Where to go next

Frequently asked questions

Want to go deeper?

What Is a Backend Engineer?

How the Web Works: HTTP Requests

REST API Design: Clean, Predictable HTTP APIs

The checkout that times out

The principle: acknowledge fast, do the slow work later

The picture: API to queue to worker pool

Sync vs async: what you are actually trading

Enqueue a job, write an idempotent worker

Idempotency, retries, and the dead-letter queue

Common mistakes that cost hours

Takeaways

Where to go next