DevgainsDevgainsDevgains
All articles

The Outbox Pattern: Reliable Messaging Without Distributed Transactions

·5 min read
The Outbox Pattern: Reliable Messaging Without Distributed Transactions

Photo: Unsplash

Here's a bug that has shipped to production in more systems than anyone would admit. A service saves an order to its database, then publishes an OrderPlaced event to a message broker. The save succeeds. The publish fails — the broker hiccuped, the network blipped, the process crashed in between. Now the order exists but no downstream service ever hears about it. Inventory isn't decremented, no confirmation email goes out, and your data is silently inconsistent.

This is the dual-write problem: you need to update two systems — your database and your broker — atomically, but they don't share a transaction. The naive fixes all fail. Publish first, then save? Now you might announce an order that never persisted. Two-phase commit across the database and broker? Slow, brittle, and most brokers don't support it well anyway. The outbox pattern solves this with a small, elegant trick.

One Transaction, One System

The insight is to stop writing to two systems in the critical path. Instead, write only to your database — but write the event there too, into an outbox table, in the same local transaction as your business data. Because both writes hit one database, ordinary ACID guarantees make them atomic. Either both commit or neither does. The dual-write problem disappears because there's no longer a dual write.

-- Both rows commit together, or not at all. No distributed transaction.
BEGIN;
  INSERT INTO orders (id, customer_id, total_cents, status)
  VALUES ('ord_42', 'cust_9', 9900, 'placed');
 
  INSERT INTO outbox (id, aggregate_id, type, payload, created_at)
  VALUES (
    'evt_7f3a', 'ord_42', 'OrderPlaced',
    '{"orderId":"ord_42","customerId":"cust_9","totalCents":9900}',
    now()
  );
COMMIT;

After the commit, a separate process reads unpublished rows from the outbox and forwards them to the broker. If it crashes before publishing, the rows are still sitting there — nothing is lost. This is documented thoroughly as the Transactional Outbox pattern on microservices.io and as a reliability pattern family in the Azure Architecture Center.

The outbox table and the business tables must live in the same database so a single transaction covers both. If your event log is in a separate datastore, you're back to the dual-write problem you were trying to escape.

Getting Events Out of the Outbox

There are two ways to move events from the outbox to the broker, and the difference is mostly about latency and infrastructure.

Polling publisher. A background worker periodically queries for unpublished events, sends them, and marks them done. Simple, no extra infrastructure, easy to reason about.

// Polling publisher: runs on an interval, drains the outbox
async function drainOutbox(db: Db, broker: Broker) {
  const events = await db.query(
    `SELECT id, type, payload FROM outbox
     WHERE published_at IS NULL
     ORDER BY created_at
     LIMIT 100`
  );
 
  for (const e of events) {
    await broker.publish(e.type, e.payload);          // at-least-once
    await db.query(
      `UPDATE outbox SET published_at = now() WHERE id = $1`,
      [e.id]
    );
  }
}

Change Data Capture (CDC). Instead of polling, you tail the database's transaction log and stream new outbox rows to the broker as they're committed. Tools like Debezium do exactly this. CDC gives you lower latency and no polling load, at the cost of running and operating the CDC pipeline. For many teams, polling every second is more than good enough; reach for CDC when latency or database load makes polling painful.

The Detail Everyone Forgets: Duplicates

Look closely at the polling worker. It publishes the event, then marks it published. If the process crashes in between — after the broker accepted the message but before the UPDATE committed — the next run will publish that same event again. This is unavoidable. The outbox pattern guarantees at-least-once delivery, never exactly-once.

That's not a flaw to fix; it's a property to design around. The fix lives on the consumer side: every consumer must be idempotent, able to process the same event twice with no extra effect. Typically the consumer records the event IDs it has already handled and skips repeats.

// Consumer side: dedupe by event ID to survive at-least-once delivery
async function handle(event: Event, db: Db) {
  const seen = await db.query(
    `INSERT INTO processed_events (event_id) VALUES ($1)
     ON CONFLICT (event_id) DO NOTHING
     RETURNING event_id`,
    [event.id]
  );
  if (seen.rowCount === 0) return; // already processed — skip
  await applyBusinessLogic(event);
}

Trying to achieve exactly-once delivery end-to-end is a trap. The honest, reliable contract is "at-least-once delivery plus idempotent consumers," which together produce exactly-once effect. Chasing exactly-once delivery leads you straight back to distributed transactions.

Operational Notes

A production outbox needs a little housekeeping:

  • Prune published rows. The outbox grows forever unless you delete (or archive) rows once they're safely published and past any retention window.
  • Index the unpublished query. A partial index on WHERE published_at IS NULL keeps the polling query fast as the table grows.
  • Preserve order where it matters. Publish in created_at order and key related events to the same partition if your broker only guarantees per-partition ordering.
  • Handle poison messages. An event that repeatedly fails to publish should be retried with backoff and eventually moved to a dead-letter path rather than blocking the whole outbox.

The pattern pairs naturally with sagas and event-driven architecture: it's the reliable plumbing that makes "update state and emit an event" trustworthy. Once you've internalized it, the dual-write bug stops being a thing that happens to you.

Takeaways

  • The dual-write problem — updating a database and a broker atomically — has no safe naive solution.
  • The outbox pattern writes the event into an outbox table in the same local transaction as the business data, so ACID makes them atomic.
  • A separate publisher (polling or CDC) forwards outbox rows to the broker after commit; nothing is lost if it crashes.
  • Delivery is at-least-once by nature, so duplicates are guaranteed — design for them, don't try to eliminate them.
  • Idempotent consumers turn at-least-once delivery into exactly-once effect, the only reliable end-to-end contract.
  • Operate it well: prune published rows, index the unpublished query, preserve ordering, and dead-letter poison messages.
5 min read

Read next