Back to Blog
Backend13 min readJun 2026

Async Processing: Queues & Workers

Slow work in the request path kills latency and reliability. Learn how message queues, background workers, and at-least-once delivery move heavy work off the critical path, and why that demands idempotency, retries, and dead-letter queues.

BackendQueuesAsyncIdempotency
SB

Sri Balaji

Founder · TheSimplifiedTech

On this page

The checkout that times out

A customer clicks Place order. Your handler charges the card, writes the order row, then, still inside the same HTTP request, renders a receipt, calls the email provider, pings the warehouse webhook, and updates the analytics warehouse. The card charge took 200ms. Everything after it took 4 seconds. The mobile client gave up at 3, showed a spinning error, and the customer tapped Place order again.

Now you have two charges, one angry customer, and a support ticket. Nothing was broken, the code did exactly what you wrote. The mistake was *where* you ran the slow work: inline, in the request path, holding the user hostage while you talked to three external systems that have nothing to do with whether the order succeeded.

The order was *done* the moment the charge cleared and the row was written. The email, the webhook, the analytics, those are follow-up work. They should not be able to slow down, or fail, the thing the user is actually waiting for. This is the problem async processing solves.

Who this is for

Backend engineers who can build a CRUD API and now need to make it fast and resilient under real load. You should be comfortable with HTTP request/response and a database. You do not need prior queue experience, we build the mental model from zero. If you have shipped a feature that "sometimes sends two emails," this article is for you.

The principle: acknowledge fast, do the slow work later

Do the minimum work needed to keep your promise to the user, acknowledge immediately, and push everything else onto a queue for a worker to finish in the background.
The async processing rule

The request path should only contain work that is required for correctness and that the user is waiting on the result of. Charging the card qualifies. Sending a confirmation email does not, the user does not stare at the screen until the email lands. So the handler does the essential bit, drops a message describing the rest onto a queue, and returns 202 Accepted in milliseconds. A separate process picks the message up later and does the slow part.

This splits one fragile synchronous chain into two independent halves. The producer (your API) only needs the queue to be up. The consumer (your worker) can be slow, can restart, can fail and retry, and the user never sees any of it. You have traded a tiny bit of immediacy (the email arrives in 2 seconds, not 0) for a large amount of latency and reliability.

Waiter clips the order ticket to the rail and walks awayProducer enqueues a job and returns immediately
The ticket rail holding orders in lineThe message queue buffering jobs
Cooks pull the next ticket when they are freeWorker pool pulling jobs at its own pace
A rush doesn't stop waiters taking orders, tickets just pile upLoad spike doesn't slow the API; the queue absorbs the backlog
A dropped dish gets re-fired from the same ticketA failed job is retried from the same message
A busy restaurant kitchen is a producer/consumer system you have already seen work.

The picture: API to queue to worker pool

POST /orderscharge + writeenqueue jobdeliversendmax retries exceeded
Client

Mobile / web

API

Producer

Database

Order row + charge

Message Queue

SQS / RabbitMQ

Worker Pool

Consumers

Email Provider

Side effect

Dead-Letter Queue

Poison messages

The API enqueues a job and returns instantly. A pool of workers pulls jobs from the queue and processes them. Jobs that keep failing branch off to a dead-letter queue for humans to inspect.

  1. 1

    Client calls the API

    POST /orders. The user is now waiting on this single request, so it must be fast.

  2. 2

    API does the essential work

    Charge the card, write the order row in the database. This is the part the user actually cares about.

  3. 3

    API enqueues a job

    It writes a small message, { type: "send_receipt", orderId: 8842 }, onto the queue. This is a sub-millisecond local call.

  4. 4

    API returns 202 immediately

    The user sees a confirmation in under 300ms. The slow follow-up work hasn't happened yet, and that's fine.

  5. 5

    A worker pulls the job

    Whenever a worker in the pool is free, it grabs the next message and processes it, sends the email, calls the webhook.

  6. 6

    Worker acknowledges success

    On success it tells the queue to delete the message. On repeated failure, the message is routed to the dead-letter queue for a human.

Sync vs async: what you are actually trading

Async is not free, and it is not always right. A read that returns data the user needs *now* must stay synchronous. The trade is concrete: you spend complexity to buy latency and reliability. Know what you are buying.

DimensionSynchronous (inline)Asynchronous (queue + worker)
User-perceived latencySum of every step, as slow as the slowest dependencyOnly the essential work; follow-up runs later
ReliabilityOne downstream failure fails the whole requestDownstream failure is isolated, retried, and invisible to the user
Load spikesEach request holds resources until done; pile-ups cascadeQueue absorbs the burst; workers drain it at a steady rate
ComplexityLow, one process, easy to reason aboutHigher, a broker, workers, retries, idempotency, monitoring
Failure visibilityImmediate, user sees the errorDeferred, you need dashboards and a DLQ to see failures
Best forReads the user is waiting on; work required for the responseEmails, webhooks, thumbnails, exports, anything fire-and-forget
Doing slow work inline vs. handing it to a queue and worker.

Enqueue a job, write an idempotent worker

Here is the producer side. The API charges the card, writes the order, then enqueues a tiny message. Notice the message carries an idempotency key, we will use it to make the worker safe to run more than once.

api/orders.py
python
import uuid

def place_order(request):
    # 1. Essential work the user is waiting on.
    charge = payments.charge(request.card, request.amount)
    order = db.orders.insert(
        user_id=request.user_id,
        amount=request.amount,
        charge_id=charge.id,
    )

    # 2. Enqueue the slow follow-up work. Sub-millisecond local call.
    queue.send({
        "type": "send_receipt",
        "order_id": order.id,
        # A stable key so retries don't duplicate side effects.
        "idempotency_key": f"receipt:{order.id}",
    })

    # 3. Return immediately. The email hasn't been sent yet, that's fine.
    return Response(status=202, body={"order_id": order.id})

And the consumer. The queue guarantees the message is delivered at least once, which means it may be delivered *twice*. The worker must produce the same result whether it runs once or five times. We enforce that by recording the idempotency key before doing the side effect, inside a unique constraint.

worker/handlers.py
python
def handle_send_receipt(msg):
    key = msg["idempotency_key"]

    # Claim the work atomically. If this key was already processed,
    # the unique index raises and we skip the side effect entirely.
    try:
        db.processed_jobs.insert(key=key)  # UNIQUE(key)
    except UniqueViolation:
        return ack(msg)  # already done, safe to drop the duplicate

    # Only now do the non-repeatable side effect.
    order = db.orders.get(msg["order_id"])
    email.send_receipt(order.user_email, order)

    return ack(msg)  # tell the queue to delete the message

Ack means "done", not "received"

A worker should acknowledge a message only after the work has fully succeeded. If the worker crashes before acking, the queue redelivers the message to another worker, that is exactly how at-least-once delivery recovers from crashes. Ack too early and a crash silently loses the job.

Idempotency, retries, and the dead-letter queue

At-least-once delivery is why idempotency is mandatory

Almost every production queue (SQS, RabbitMQ, Kafka) delivers at least once, not exactly once. "Exactly once" across a network and a crash is famously close to impossible. So you must assume any message can arrive more than once: a worker processed it, sent the email, then crashed before acking, the queue redelivers, and a second worker sends the email *again*. Idempotency is the property that processing the same message twice has the same effect as processing it once. The unique-key check above is how we get it: the first run claims the key, every duplicate hits the constraint and short-circuits.

Retries with backoff

When a job fails, the email provider is down, the network blipped, you do not drop it, you retry. But retry *with exponential backoff and jitter*: wait 1s, then 2s, then 4s, then 8s, with a little randomness so a thousand failed jobs don't all retry in lockstep and hammer the recovering dependency. Crucially, retries must be bounded. An infinite retry loop on a permanently-bad message (a "poison message") will spin forever, burn CPU, and block the queue.

The dead-letter queue catches what can't succeed

After N failed attempts (5 is a common default), the broker moves the message to a dead-letter queue, a separate queue for messages that could not be processed. The DLQ is your safety net: nothing is silently lost, the bad message stops blocking the main queue, and a human (or an alert) can inspect it, fix the root cause, and replay it. A queue without a DLQ either loses messages or jams forever. Always wire one up, and always alert on DLQ depth greater than zero.

Common mistakes that cost hours

  1. Non-idempotent handlers. Assuming each message arrives exactly once. Under at-least-once delivery, a duplicate means a double charge or a double email. Make every handler safe to re-run before you ship it.
  2. No dead-letter queue. Without a DLQ, a poison message either gets lost on the last failed attempt or retries forever and blocks everything behind it. Configure a DLQ and alert when it is non-empty.
  3. Infinite retries. Retrying a permanently-bad message forever wastes resources and starves healthy jobs. Bound retries, use exponential backoff with jitter, then dead-letter.
  4. Assuming message ordering. Most queues do not guarantee order across messages, and a worker pool processes them in parallel anyway. If "updated" can be processed before "created", your data corrupts. Don't depend on order, or use an ordered/FIFO queue with a partition key and accept the throughput cost.
  5. Acking before the work is done. Acknowledging on receipt instead of on success turns every worker crash into a silently dropped job.
  6. Fat messages. Putting a 5MB payload on the queue instead of an ID the worker re-fetches. Keep messages small; pass references, not data.

Takeaways

The whole article in seven lines

  • Keep the request path to work the user is waiting on; acknowledge fast and push the rest onto a queue.
  • Producer enqueues and returns 202; a worker pool consumes and does the slow side effects later.
  • Queues deliver at-least-once, so every message can arrive twice. Idempotency is mandatory, not optional.
  • Make handlers idempotent by claiming a stable idempotency key under a unique constraint before the side effect.
  • Ack only after success; a crash before ack triggers redelivery, which is how the system self-heals.
  • Retry with bounded exponential backoff and jitter; route exhausted messages to a dead-letter queue and alert on it.
  • Never assume ordering across messages unless you explicitly use a FIFO queue with a partition key.

Where to go next

Async processing is one pillar of building backends that stay fast and reliable under load. Pair it with the broader scaling patterns and the data-consistency rules that decide what the producer must do *before* it enqueues.

Want to go deeper?

This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.