Transactional outbox and Change Data Capture both solve the “how do I reliably publish events after writing to a database” problem. They do it differently, with different trade-offs. After shipping both in production, here’s my practical take on choosing.

The two patterns in one paragraph each

Transactional outbox — in the same DB transaction that writes your business data, also insert a row in an outbox table. A separate publisher periodically reads new outbox rows and sends them to Kafka (or any broker). Application code deliberately emits events.

Change Data Capture (CDC) — a tool (typically Debezium) tails the database’s write-ahead log. Every insert/update/delete produces a Kafka message automatically. No outbox table; the WAL is the event source.

Both give at-least-once delivery, both handle crashes correctly, both produce Kafka events that downstream services can consume.

When outbox wins

You control event design. Outbox events are business-level (“OrderPlaced”), not table-level (“orders.row inserted”). Downstream doesn’t need to know your schema.

Evolving the DB is easy. Rename a column, reorganize tables — your events stay the same because you compose them in code.

No ops complexity added. Postgres, Kafka, and application code — you already have these. No Debezium, no Kafka Connect, no schema registry dependency.

Tighter control over what’s published. You decide which operations emit events. CDC emits everything, which is often too much.

Easier local dev. No Kafka Connect required for developers to run things.

Well-understood failure modes. “Outbox has 50k unpublished rows” is easy to diagnose.

When CDC wins

High write volume. Outbox polling at 100k writes/sec is expensive — the polling query itself becomes load. CDC reads the WAL with zero app load.

Heterogeneous producers. Legacy apps that can’t be modified still produce WAL entries. CDC catches them; outbox requires app changes.

Zero-delay delivery matters. Outbox has polling latency (500ms typical). CDC is near-realtime (10-50ms typical).

Full data replication is the goal. CDC naturally produces a complete stream of changes — great for replicating to data warehouses, search indexes, etc.

You already run Kafka Connect. The operational investment is paid; adding CDC is low marginal cost.

Where CDC is tricky

Events are table-shaped, not business-shaped. “orders UPDATE” is not “OrderPaid” — it’s “these fields changed”. Consumers have to infer business meaning from column diffs. Painful.

Schema changes ripple. Add a column; all consumers see the new field. Rename a column; consumers break. CDC couples consumers to your DB schema, which is the coupling we’re usually trying to avoid with services.

Operational overhead. Debezium requires tuning, WAL slot management, careful handling around database failovers. A whole stack to run.

Doesn’t easily map to domain events. The events are database events, not business events. For domain-driven systems, this feels wrong.

Secrets can leak. WAL includes every column. If your DB has PII or tokens, they end up in Kafka — careful config required.

The hybrid pattern

Some teams use both:

  • Outbox for domain events — hand-crafted, business-shaped, versioned schemas
  • CDC for analytics — raw table changes streamed to data warehouse / search index for BI

This separation aligns each tool with what it’s good at. Domain services publish clean events; analytics infrastructure consumes everything via CDC.

A concrete comparison

Imagine an orders service. Order is placed, needs to notify shipping, billing, search, and data warehouse.

Outbox approach:

  • In the order placement transaction: save order row + save OrderPlacedEvent to outbox
  • Publisher emits orders.placed topic with business-shaped event
  • Shipping, billing subscribe; they get clean OrderPlaced events
  • Search index maintained by a dedicated projection service
  • Data warehouse loaded from orders.placed events

CDC approach:

  • Debezium streams orders table changes to db.orders topic
  • Shipping service consumes, builds its own state
  • Billing service consumes, builds its own state
  • Search index maintained by direct CDC → Elasticsearch sink
  • Data warehouse loaded from CDC stream

Both work. The CDC version couples all consumers to the orders table schema; the outbox version gives you a stable OrderPlaced contract.

Operational cost comparison

Outbox:

  • New table and index
  • Publisher cron or scheduled task
  • Monitoring for unpublished count
  • No new infrastructure

CDC:

  • Debezium + Kafka Connect cluster
  • Schema registry
  • Connector configuration
  • WAL slot management
  • Failover and replication interactions
  • Separate monitoring for connector health

For small and medium systems, outbox wins on operational simplicity by a mile. For massive, multi-team systems where the investment is already paid, CDC offers better scalability.

My decision rule

  • System throughput < 10k writes/sec and domain-event-shaped: outbox
  • System throughput > 50k writes/sec: CDC seriously considered
  • Need both domain events and analytics streaming: outbox for domain, CDC for analytics
  • New system, small team: outbox, always
  • Legacy system you can’t modify: CDC, reluctantly

Closing note

The outbox vs CDC choice is less about “which is better” and more about “which fits your constraints”. Outbox demands app-code changes but keeps operations simple and event design clean. CDC avoids app changes but adds infrastructure and ties consumers to your schema. Most successful systems I’ve seen start with outbox and add CDC only if and when analytics volume demands it. The instinct to reach for CDC because it sounds more sophisticated usually doesn’t pay back.