Idempotency and Exactly-Once Delivery: Making Retries Safe

On this page

The double-charge bug
An elevator button you can mash
What a safe retry looks like
HTTP already has rules about this
A charge endpoint that survives retries
Natural vs synthetic idempotency
The dedupe window and why exactly-once is a lie
Idempotent consumers and the outbox pattern
Common mistakes that cost hours
Takeaways
Where to go next

TL;DR

Exactly-once delivery is a myth, so make retries safe instead: idempotency keys, a dedupe window, and the outbox pattern keep a retried payment from charging the customer twice.

The double-charge bug

Who this is for

You built a POST /charge endpoint. A customer taps Pay, your server charges their card, then the response times out on the way back. The client retries. Now the card is charged twice, support is angry, and you are reading your access logs at midnight. This article is for any backend or SRE engineer who has to make retries safe.

Here is the uncomfortable truth: the network cannot tell you whether a request succeeded. When a call times out, the work may have completed, partially completed, or never started. You genuinely do not know. So the client retries, and it *should*, because giving up means dropping real requests on the floor.

Retries are not optional. They are how distributed systems survive packet loss, slow networks, and restarts. The job is not to avoid retries, it is to make the second, third, and tenth attempt harmless. That property is called idempotency.

An operation is idempotent if doing it once and doing it many times produce the same result.
The one-line definition worth memorizing

An elevator button you can mash

You walk up to an elevator and press the call button. It lights up. You press it again, impatiently, four more times. The elevator does not arrive five times, it arrives once. The button is idempotent: extra presses change nothing about the outcome.

Pressing the call button onceThe first POST /charge request

Mashing it five more timesClient retries after a timeout

The elevator still comes exactly onceThe card is charged exactly once

The button remembers it is already litThe server remembers the idempotency key

A different floor button = a different callA different key = a different charge

Idempotency is the elevator-button property applied to your API.

The magic is the *memory*. The button does not re-trigger because it already knows the request is in flight. Your endpoint needs the same memory: a place to record "I have already seen this exact request" so the retry can be recognized and short-circuited.

What a safe retry looks like

The client generates a unique idempotency key and sends it with every attempt of the *same logical operation*. All retries reuse that one key. The server keeps a small store keyed by it. The first request does the work and caches the result; every retry with that key replays the cached result instead of charging again.

A retried payment short-circuits at the idempotency store before it ever reaches the payment processor a second time.

1
Client mints a key
Generate a UUID once, before the first send. Reuse it for every retry of that same payment. A new key means a new charge.
2
Server tries to claim it
Insert a row with the key as a unique column. If the insert wins, this is the first time, proceed. If it collides, a previous request already owns this key.
3
Do the work exactly once
Call the payment processor, then store the response body and status against the key.
4
Replay on duplicate
When a retry collides on the key, return the stored response. The processor is never called again.

HTTP already has rules about this

The HTTP spec classifies methods by whether repeating them is safe. This is not trivia, it tells you which endpoints get retries for free and which ones you must protect with a key.

Method	Idempotent?	Why
GET	Yes	Reads nothing changes; retry freely
PUT	Yes	Replaces a resource at a known URL; same body, same end state
DELETE	Yes	Deleting twice leaves it deleted; result is identical
POST	No	Creates a new resource each call, so a retry creates a duplicate
PATCH	Depends	Idempotent only if the patch is absolute, not relative (set vs increment)

Idempotency by HTTP method. POST is the dangerous one.

The PUT trick

If you can phrase a write as PUT /charges/{client-chosen-id} instead of POST /charges, you get idempotency from the URL itself, this is *natural* idempotency. When you can't (the processor mints the id), you add a *synthetic* idempotency key. More on that distinction below.

A charge endpoint that survives retries

Here is the core pattern in Express + TypeScript. The client sends an Idempotency-Key header. We store the key together with a hash of the request body and the cached response. A duplicate key replays the response; a duplicate key with a *different* body is rejected.

routes/charge.ts

typescript

import { Router, Request, Response } from "express";
import { createHash } from "crypto";
import { db } from "../db";
import { paymentProcessor } from "../psp";

const router = Router();

function hashBody(body: unknown): string {
  return createHash("sha256").update(JSON.stringify(body)).digest("hex");
}

router.post("/charge", async (req: Request, res: Response) => {
  const key = req.header("Idempotency-Key");
  if (!key) {
    return res.status(400).json({ error: "Idempotency-Key header required" });
  }

  const requestHash = hashBody(req.body);

  // 1. Try to CLAIM the key. The unique constraint makes this atomic:
  //    only the first concurrent request wins the insert.
  try {
    await db.query(
      `INSERT INTO idempotency_keys (key, request_hash, status)
       VALUES ($1, $2, 'in_progress')`,
      [key, requestHash],
    );
  } catch (err) {
    // 2. Collision: someone already owns this key. Replay.
    const existing = await db.query(
      `SELECT request_hash, status, response_code, response_body
         FROM idempotency_keys WHERE key = $1`,
      [key],
    );
    const row = existing.rows[0];

    // Same key, different body = client bug. Refuse it.
    if (row.request_hash !== requestHash) {
      return res.status(422).json({
        error: "Idempotency-Key reused with a different request body",
      });
    }

    // Still running? Tell the client to back off and retry later.
    if (row.status === "in_progress") {
      return res.status(409).json({ error: "Request already in progress" });
    }

    // Completed: return the EXACT cached response. No second charge.
    return res.status(row.response_code).json(row.response_body);
  }

  // 3. We won the claim. Do the real work exactly once.
  const result = await paymentProcessor.charge({
    amountCents: req.body.amountCents,
    currency: req.body.currency,
    source: req.body.source,
    // Pass the key downstream too so the PSP dedupes as well.
    idempotencyKey: key,
  });

  const responseBody = { chargeId: result.id, status: result.status };

  // 4. Cache the result against the key for future replays.
  await db.query(
    `UPDATE idempotency_keys
        SET status = 'completed', response_code = $2, response_body = $3
      WHERE key = $1`,
    [key, 201, responseBody],
  );

  return res.status(201).json(responseBody);
});

export default router;

The whole scheme rests on one database guarantee: the unique constraint on the key column. That is what makes the claim atomic even when two retries land at the same millisecond.

migrations/001_idempotency.sql

sql

CREATE TABLE idempotency_keys (
  key           TEXT PRIMARY KEY,        -- client-generated, unique
  request_hash  TEXT NOT NULL,           -- detects body mismatch on reuse
  status        TEXT NOT NULL,           -- in_progress | completed
  response_code INTEGER,                 -- cached HTTP status
  response_body JSONB,                   -- cached response payload
  created_at    TIMESTAMPTZ NOT NULL DEFAULT now()
);

-- Expire old keys so the table does not grow forever.
-- A daily job deletes rows past the dedupe window.
CREATE INDEX idx_idempotency_created_at ON idempotency_keys (created_at);

Wrap the work and the cache in one transaction

If your real work writes to the same database (e.g. an internal ledger), do the work and the UPDATE ... SET status = 'completed' in a single transaction. Otherwise a crash between them leaves a key stuck in_progress while the side effect already happened. See database transactions and consistency.

Natural vs synthetic idempotency

There are two ways an operation becomes idempotent, and knowing which one you have changes your design.

Natural idempotency, the operation is inherently repeatable because it targets a known identity. PUT /users/42 { plan: 'pro' } sets the plan to pro no matter how many times you send it. DELETE /sessions/abc ends in the same state every time. No extra machinery needed.
Synthetic idempotency, the operation creates something new, so it is *not* naturally repeatable. You bolt safety on with an idempotency key + a dedupe store, exactly as in the charge endpoint above. Payments, sending email, and 'create order' all need this.

Reach for natural idempotency first, it has no bookkeeping to expire or clean up. Only when the operation genuinely mints new state (a charge, an order, a notification) do you add the synthetic layer.

The dedupe window and why exactly-once is a lie

Your idempotency store cannot grow forever, so keys expire after a dedupe window, often 24 hours to a few days. Inside the window, a retry is recognized and replayed. Outside it, the same key is treated as brand new. Size the window longer than your maximum realistic retry horizon (client backoff, queue redelivery, manual replays).

Idempotency is not exactly-once delivery

People conflate these. Exactly-once delivery across a network is impossible, the famous result is that you cannot guarantee a message is delivered once and only once when machines can crash. What you *can* build is at-least-once delivery + idempotent processing, which yields exactly-once effects. The message may arrive many times; the effect happens once. That combination is what production systems actually mean when they say 'exactly-once'.

You don't get exactly-once delivery. You get at-least-once delivery and you make the handler idempotent. The result looks exactly-once from the outside.
The mental model that fixes most messaging bugs

Idempotent consumers and the outbox pattern

The same problem reappears in message queues. A queue with at-least-once semantics (SQS, Kafka, RabbitMQ) *will* redeliver a message if a consumer crashes before acknowledging. So consumers must be idempotent too: dedupe on a message id before applying any side effect.

consumers/paymentCaptured.ts

typescript

async function handlePaymentCaptured(msg: QueueMessage): Promise<void> {
  // Dedupe on the message's stable id before doing anything.
  const claimed = await db.query(
    `INSERT INTO processed_messages (message_id) VALUES ($1)
     ON CONFLICT (message_id) DO NOTHING
     RETURNING message_id`,
    [msg.id],
  );

  if (claimed.rowCount === 0) {
    // Already handled on a prior delivery. Ack and move on.
    await msg.ack();
    return;
  }

  await ledger.recordCapture(msg.body); // the actual side effect
  await msg.ack();
}

But there is a second, sneakier failure: the dual-write problem. When a request both updates your database *and* publishes an event, a crash between the two leaves them inconsistent, the charge is recorded but no event fires, or vice versa. You cannot make a DB commit and a queue publish atomic.

The outbox pattern fixes this. Instead of publishing directly, you write the event into an outbox table *in the same transaction* as your business change. A separate relay polls the outbox and publishes to the queue, marking rows as sent. Because the write and the event share one transaction, they commit or roll back together. The relay guarantees at-least-once publication, which is exactly why your consumers must be idempotent.

migrations/002_outbox.sql

sql

BEGIN;
  -- business change + event written together, atomically
  INSERT INTO charges (id, amount_cents, status)
  VALUES ('ch_123', 4200, 'captured');

  INSERT INTO outbox (id, topic, payload, status)
  VALUES ('evt_987', 'payment.captured',
          '{"chargeId":"ch_123"}'::jsonb, 'pending');
COMMIT;

-- A relay process later does:
--   SELECT * FROM outbox WHERE status = 'pending';
--   publish to queue; then UPDATE outbox SET status = 'sent';

The full chain

Outbox (atomic DB + event) gives at-least-once publication. Idempotent consumers (dedupe on message id) give exactly-once effects. Together they are the standard recipe for reliable event-driven payments. See async processing, queues, and workers.

Common mistakes that cost hours

1Storing the key but not the response. If you only record 'this key was used' and forget to cache the result, the retry can't replay anything, you either error out or, worse, do the work again. Always store the response body and status with the key.
2No expiry on keys. A key table with no dedupe window grows without bound and eventually melts your database. Set a window, index created_at, and reap old rows on a schedule.
3Ignoring body mismatch. If a client reuses a key with a different amount, returning the *old* cached response silently drops the new charge; blindly processing it double-charges. Hash the body and reject reuse with a 422.
4Generating the key server-side. The key must be minted by the client before the first send, so all retries share it. A server-generated key is different on every attempt, useless for dedupe.
5Not passing the key downstream. Your endpoint is idempotent, but if you call the payment processor without forwarding a key, *its* retries can still double-charge. Propagate the key to every external call.

Takeaways

The whole article in seven lines

The network gives you at-least-once, never exactly-once, so retries are mandatory.
Idempotency means a repeated operation has the same effect as doing it once.
GET/PUT/DELETE are idempotent; POST is not, protect POST with an idempotency key.
Client mints the key, server stores it with a request hash and the cached response, and replays on duplicate.
A unique DB constraint on the key is what makes the claim atomic under concurrency.
Prefer natural idempotency (PUT to a known id); add synthetic keys only when you create new state.
Exactly-once delivery is impossible; at-least-once + idempotent consumers = exactly-once effects. Use the outbox for atomic DB-plus-event writes.

Where to go next

Idempotency sits at the intersection of API design, transactions, and messaging. Strengthen each side:

REST API design, how to choose methods and model resources so natural idempotency falls out for free.
Database transactions and consistency, the atomicity guarantees that the unique-key claim and the outbox pattern depend on.
Async processing, queues, and workers, at-least-once delivery, redelivery, and where idempotent consumers fit.

Build the charge endpoint above, point a load test at it that retries aggressively, and confirm the processor is called exactly once. Once you have felt a retry get safely replayed, you will never ship an unprotected POST again.

You need a write endpoint to survive retries safely. How should you make it idempotent?

Check your understanding

1. Why does the article say the client should keep retrying a timed-out charge request rather than giving up?

2. How does the idempotency-key scheme prevent a double charge on retry?

Frequently asked questions

Why does a timed-out request cause double charges?

The network cannot tell you whether a request succeeded, so when a call times out the work may have completed, partially completed, or never started. The client retries, which it should, and without protection the second attempt charges the card again.

How does an idempotency key make retries safe?

The client generates one unique key and sends it with every attempt of the same logical operation. The server keeps a small store keyed by it: the first request does the work and caches the result, and every retry with that key replays the cached result instead of charging again.

What is the difference between natural and synthetic idempotency?

Natural idempotency comes from the URL itself, for example PUT /charges/{client-chosen-id}, where repeating the same write targets the same resource. When you cannot choose the id because the processor mints it, you add a synthetic idempotency key to get the same protection.

Why is exactly-once delivery described as a lie?

Networks give you at-least-once, never true exactly-once, so you rely on a dedupe window plus idempotency to make repeated attempts harmless. What feels like exactly-once is really at-least-once delivery combined with deduplication, not a guarantee the network can provide.

Was this article helpful?

Want to go deeper?

This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.

Explore Career Paths Try the Labs

Keep reading

DevOps

Kubernetes in Production: Beyond the Tutorial

Read

Cloud

Reliability & Resilience: Designing for Failure

Read

SRE

What is Site Reliability Engineering?

Read

Idempotency and Exactly-Once Delivery: Making Retries Safe

01The double-charge bug

02An elevator button you can mash

03What a safe retry looks like

04HTTP already has rules about this

05A charge endpoint that survives retries

06Natural vs synthetic idempotency

07The dedupe window and why exactly-once is a lie

08Idempotent consumers and the outbox pattern

09Common mistakes that cost hours

10Takeaways

11Where to go next

Frequently asked questions

Want to go deeper?

Kubernetes in Production: Beyond the Tutorial

Reliability & Resilience: Designing for Failure

What is Site Reliability Engineering?

The double-charge bug

An elevator button you can mash

What a safe retry looks like

HTTP already has rules about this

A charge endpoint that survives retries

Natural vs synthetic idempotency

The dedupe window and why exactly-once is a lie

Idempotent consumers and the outbox pattern

Common mistakes that cost hours

Takeaways

Where to go next