Idempotency and Exactly-Once Delivery: Making Retries Safe
Networks give you at-least-once, never exactly-once. Learn how idempotency keys, dedupe windows, and the outbox pattern stop a retried payment from charging your customer twice.
You built a `POST /charge` endpoint. A customer taps **Pay**, your server charges their card, then the response times out on the way back. The client retries. Now the card is charged twice, support is angry, and you are reading your access logs at midnight. This article is for any backend or SRE engineer who has to make retries safe.
Here is the uncomfortable truth: the network cannot tell you whether a request succeeded. When a call times out, the work may have completed, partially completed, or never started. You genuinely do not know. So the client retries, and it *should*, because giving up means dropping real requests on the floor.
Retries are not optional. They are how distributed systems survive packet loss, slow networks, and restarts. The job is not to avoid retries, it is to make the second, third, and tenth attempt harmless. That property is called idempotency.
An operation is idempotent if doing it once and doing it many times produce the same result.
An elevator button you can mash
You walk up to an elevator and press the call button. It lights up. You press it again, impatiently, four more times. The elevator does not arrive five times, it arrives once. The button is idempotent: extra presses change nothing about the outcome.
Pressing the call button onceThe first POST /charge request
Mashing it five more timesClient retries after a timeout
The elevator still comes exactly onceThe card is charged exactly once
The button remembers it is already litThe server remembers the idempotency key
A different floor button = a different callA different key = a different charge
Idempotency is the elevator-button property applied to your API.
The magic is the *memory*. The button does not re-trigger because it already knows the request is in flight. Your endpoint needs the same memory: a place to record "I have already seen this exact request" so the retry can be recognized and short-circuited.
What a safe retry looks like
The client generates a unique idempotency key and sends it with every attempt of the *same logical operation*. All retries reuse that one key. The server keeps a small store keyed by it. The first request does the work and caches the result; every retry with that key replays the cached result instead of charging again.
A retried payment short-circuits at the idempotency store before it ever reaches the payment processor a second time.
1
Client mints a key
Generate a UUID once, before the first send. Reuse it for every retry of that same payment. A new key means a new charge.
2
Server tries to claim it
Insert a row with the key as a unique column. If the insert wins, this is the first time, proceed. If it collides, a previous request already owns this key.
3
Do the work exactly once
Call the payment processor, then store the response body and status against the key.
4
Replay on duplicate
When a retry collides on the key, return the stored response. The processor is never called again.
HTTP already has rules about this
The HTTP spec classifies methods by whether repeating them is safe. This is not trivia, it tells you which endpoints get retries for free and which ones you must protect with a key.
Method
Idempotent?
Why
GET
Yes
Reads nothing changes; retry freely
PUT
Yes
Replaces a resource at a known URL; same body, same end state
DELETE
Yes
Deleting twice leaves it deleted; result is identical
POST
No
Creates a new resource each call, so a retry creates a duplicate
PATCH
Depends
Idempotent only if the patch is absolute, not relative (set vs increment)
Idempotency by HTTP method. POST is the dangerous one.
The PUT trick
If you can phrase a write as **PUT /charges/{client-chosen-id}** instead of **POST /charges**, you get idempotency from the URL itself, this is *natural* idempotency. When you can't (the processor mints the id), you add a *synthetic* idempotency key. More on that distinction below.
A charge endpoint that survives retries
Here is the core pattern in Express + TypeScript. The client sends an Idempotency-Key header. We store the key together with a hash of the request body and the cached response. A duplicate key replays the response; a duplicate key with a *different* body is rejected.
routes/charge.ts
typescript
import { Router, Request, Response } from"express";
import { createHash } from"crypto";
import { db } from"../db";
import { paymentProcessor } from"../psp";
const router = Router();
functionhashBody(body: unknown): string {
returncreateHash("sha256").update(JSON.stringify(body)).digest("hex");
}
router.post("/charge", async (req: Request, res: Response) => {
const key = req.header("Idempotency-Key");
if (!key) {
return res.status(400).json({ error: "Idempotency-Key header required" });
}
const requestHash = hashBody(req.body);
// 1. Try to CLAIM the key. The unique constraint makes this atomic:// only the first concurrent request wins the insert.try {
await db.query(
`INSERT INTO idempotency_keys (key, request_hash, status)
VALUES ($1, $2, 'in_progress')`,
[key, requestHash],
);
} catch (err) {
// 2. Collision: someone already owns this key. Replay.const existing = await db.query(
`SELECT request_hash, status, response_code, response_body
FROM idempotency_keys WHERE key = $1`,
[key],
);
const row = existing.rows[0];
// Same key, different body = client bug. Refuse it.if (row.request_hash !== requestHash) {
return res.status(422).json({
error: "Idempotency-Key reused with a different request body",
});
}
// Still running? Tell the client to back off and retry later.if (row.status === "in_progress") {
return res.status(409).json({ error: "Request already in progress" });
}
// Completed: return the EXACT cached response. No second charge.return res.status(row.response_code).json(row.response_body);
}
// 3. We won the claim. Do the real work exactly once.const result = await paymentProcessor.charge({
amountCents: req.body.amountCents,
currency: req.body.currency,
source: req.body.source,
// Pass the key downstream too so the PSP dedupes as well.
idempotencyKey: key,
});
const responseBody = { chargeId: result.id, status: result.status };
// 4. Cache the result against the key for future replays.await db.query(
`UPDATE idempotency_keys
SET status = 'completed', response_code = $2, response_body = $3
WHERE key = $1`,
[key, 201, responseBody],
);
return res.status(201).json(responseBody);
});
exportdefault router;
The whole scheme rests on one database guarantee: the unique constraint on the key column. That is what makes the claim atomic even when two retries land at the same millisecond.
migrations/001_idempotency.sql
sql
CREATE TABLE idempotency_keys (
key TEXT PRIMARY KEY, -- client-generated, unique
request_hash TEXT NOT NULL, -- detects body mismatch on reuse
status TEXT NOT NULL, -- in_progress | completed
response_code INTEGER, -- cached HTTP status
response_body JSONB, -- cached response payload
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
-- Expire old keys so the table does not grow forever.
-- A daily job deletes rows past the dedupe window.
CREATE INDEX idx_idempotency_created_at ON idempotency_keys (created_at);
Wrap the work and the cache in one transaction
If your real work writes to the same database (e.g. an internal ledger), do the work and the `UPDATE ... SET status = 'completed'` in a **single transaction**. Otherwise a crash between them leaves a key stuck `in_progress` while the side effect already happened. See [database transactions and consistency](/blog/database-transactions-and-consistency).
Natural vs synthetic idempotency
There are two ways an operation becomes idempotent, and knowing which one you have changes your design.
Natural idempotency, the operation is inherently repeatable because it targets a known identity. PUT /users/42 { plan: 'pro' } sets the plan to pro no matter how many times you send it. DELETE /sessions/abc ends in the same state every time. No extra machinery needed.
Synthetic idempotency, the operation creates something new, so it is *not* naturally repeatable. You bolt safety on with an idempotency key + a dedupe store, exactly as in the charge endpoint above. Payments, sending email, and 'create order' all need this.
Reach for natural idempotency first, it has no bookkeeping to expire or clean up. Only when the operation genuinely mints new state (a charge, an order, a notification) do you add the synthetic layer.
The dedupe window and why exactly-once is a lie
Your idempotency store cannot grow forever, so keys expire after a dedupe window, often 24 hours to a few days. Inside the window, a retry is recognized and replayed. Outside it, the same key is treated as brand new. Size the window longer than your maximum realistic retry horizon (client backoff, queue redelivery, manual replays).
Idempotency is not exactly-once delivery
People conflate these. **Exactly-once delivery** across a network is impossible, the famous result is that you cannot guarantee a message is delivered once and only once when machines can crash. What you *can* build is **at-least-once delivery + idempotent processing**, which yields **exactly-once effects**. The message may arrive many times; the effect happens once. That combination is what production systems actually mean when they say 'exactly-once'.
You don't get exactly-once delivery. You get at-least-once delivery and you make the handler idempotent. The result looks exactly-once from the outside.
Idempotent consumers and the outbox pattern
The same problem reappears in message queues. A queue with at-least-once semantics (SQS, Kafka, RabbitMQ) *will* redeliver a message if a consumer crashes before acknowledging. So consumers must be idempotent too: dedupe on a message id before applying any side effect.
consumers/paymentCaptured.ts
typescript
asyncfunctionhandlePaymentCaptured(msg: QueueMessage): Promise<void> {
// Dedupe on the message's stable id before doing anything.const claimed = await db.query(
`INSERT INTO processed_messages (message_id) VALUES ($1)
ON CONFLICT (message_id) DO NOTHING
RETURNING message_id`,
[msg.id],
);
if (claimed.rowCount === 0) {
// Already handled on a prior delivery. Ack and move on.await msg.ack();
return;
}
await ledger.recordCapture(msg.body); // the actual side effectawait msg.ack();
}
But there is a second, sneakier failure: the dual-write problem. When a request both updates your database *and* publishes an event, a crash between the two leaves them inconsistent, the charge is recorded but no event fires, or vice versa. You cannot make a DB commit and a queue publish atomic.
The outbox pattern fixes this. Instead of publishing directly, you write the event into an outbox table *in the same transaction* as your business change. A separate relay polls the outbox and publishes to the queue, marking rows as sent. Because the write and the event share one transaction, they commit or roll back together. The relay guarantees at-least-once publication, which is exactly why your consumers must be idempotent.
migrations/002_outbox.sql
sql
BEGIN;
-- business change + event written together, atomically
INSERT INTO charges (id, amount_cents, status)
VALUES ('ch_123', 4200, 'captured');
INSERT INTO outbox (id, topic, payload, status)
VALUES ('evt_987', 'payment.captured',
'{"chargeId":"ch_123"}'::jsonb, 'pending');
COMMIT;
-- A relay process later does:
-- SELECT * FROM outbox WHERE status = 'pending';
-- publish to queue; then UPDATE outbox SET status = 'sent';
The full chain
Outbox (atomic DB + event) gives at-least-once publication. Idempotent consumers (dedupe on message id) give exactly-once effects. Together they are the standard recipe for reliable event-driven payments. See [async processing, queues, and workers](/blog/async-processing-queues-and-workers).
Common mistakes that cost hours
Storing the key but not the response. If you only record 'this key was used' and forget to cache the result, the retry can't replay anything, you either error out or, worse, do the work again. Always store the response body and status with the key.
No expiry on keys. A key table with no dedupe window grows without bound and eventually melts your database. Set a window, index created_at, and reap old rows on a schedule.
Ignoring body mismatch. If a client reuses a key with a different amount, returning the *old* cached response silently drops the new charge; blindly processing it double-charges. Hash the body and reject reuse with a 422.
Generating the key server-side. The key must be minted by the client before the first send, so all retries share it. A server-generated key is different on every attempt, useless for dedupe.
Not passing the key downstream. Your endpoint is idempotent, but if you call the payment processor without forwarding a key, *its* retries can still double-charge. Propagate the key to every external call.
Takeaways
The whole article in seven lines
The network gives you at-least-once, never exactly-once, so retries are mandatory.
Idempotency means a repeated operation has the same effect as doing it once.
GET/PUT/DELETE are idempotent; POST is not, protect POST with an idempotency key.
Client mints the key, server stores it with a request hash and the cached response, and replays on duplicate.
A unique DB constraint on the key is what makes the claim atomic under concurrency.
Prefer natural idempotency (PUT to a known id); add synthetic keys only when you create new state.
Exactly-once delivery is impossible; at-least-once + idempotent consumers = exactly-once effects. Use the outbox for atomic DB-plus-event writes.
Where to go next
Idempotency sits at the intersection of API design, transactions, and messaging. Strengthen each side:
REST API design, how to choose methods and model resources so natural idempotency falls out for free.
Build the charge endpoint above, point a load test at it that retries aggressively, and confirm the processor is called exactly once. Once you have felt a retry get safely replayed, you will never ship an unprotected POST again.
Want to go deeper?
This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.