Transactional outbox and Change Data Capture both solve the “how do I reliably publish events after writing to a database” problem. They do it differently, with different trade-offs. After shipping both in production, here’s my practical take on choosing.
The two patterns in one paragraph each
Transactional outbox — in the same DB transaction that writes your business data, also insert a row in an outbox table. A separate publisher periodically reads new outbox rows and sends them to Kafka (or any broker). Application code deliberately emits events.
Change Data Capture (CDC) — a tool (typically Debezium) tails the database’s write-ahead log. Every insert/update/delete produces a Kafka message automatically. No outbox table; the WAL is the event source.
Both give at-least-once delivery, both handle crashes correctly, both produce Kafka events that downstream services can consume.
When outbox wins
You control event design. Outbox events are business-level (“OrderPlaced”), not table-level (“orders.row inserted”). Downstream doesn’t need to know your schema.
Evolving the DB is easy. Rename a column, reorganize tables — your events stay the same because you compose them in code.
No ops complexity added. Postgres, Kafka, and application code — you already have these. No Debezium, no Kafka Connect, no schema registry dependency.
Tighter control over what’s published. You decide which operations emit events. CDC emits everything, which is often too much.
Easier local dev. No Kafka Connect required for developers to run things.
Well-understood failure modes. “Outbox has 50k unpublished rows” is easy to diagnose.
When CDC wins
High write volume. Outbox polling at 100k writes/sec is expensive — the polling query itself becomes load. CDC reads the WAL with zero app load.
Heterogeneous producers. Legacy apps that can’t be modified still produce WAL entries. CDC catches them; outbox requires app changes.
Zero-delay delivery matters. Outbox has polling latency (500ms typical). CDC is near-realtime (10-50ms typical).
Full data replication is the goal. CDC naturally produces a complete stream of changes — great for replicating to data warehouses, search indexes, etc.
You already run Kafka Connect. The operational investment is paid; adding CDC is low marginal cost.
Where CDC is tricky
Events are table-shaped, not business-shaped. “orders UPDATE” is not “OrderPaid” — it’s “these fields changed”. Consumers have to infer business meaning from column diffs. Painful.
Schema changes ripple. Add a column; all consumers see the new field. Rename a column; consumers break. CDC couples consumers to your DB schema, which is the coupling we’re usually trying to avoid with services.
Operational overhead. Debezium requires tuning, WAL slot management, careful handling around database failovers. A whole stack to run.
Doesn’t easily map to domain events. The events are database events, not business events. For domain-driven systems, this feels wrong.
Secrets can leak. WAL includes every column. If your DB has PII or tokens, they end up in Kafka — careful config required.
The hybrid pattern
Some teams use both:
- Outbox for domain events — hand-crafted, business-shaped, versioned schemas
- CDC for analytics — raw table changes streamed to data warehouse / search index for BI
This separation aligns each tool with what it’s good at. Domain services publish clean events; analytics infrastructure consumes everything via CDC.
A concrete comparison
Imagine an orders service. Order is placed, needs to notify shipping, billing, search, and data warehouse.
Outbox approach:
- In the order placement transaction: save order row + save
OrderPlacedEventto outbox - Publisher emits
orders.placedtopic with business-shaped event - Shipping, billing subscribe; they get clean OrderPlaced events
- Search index maintained by a dedicated projection service
- Data warehouse loaded from
orders.placedevents
CDC approach:
- Debezium streams
orderstable changes todb.orderstopic - Shipping service consumes, builds its own state
- Billing service consumes, builds its own state
- Search index maintained by direct CDC → Elasticsearch sink
- Data warehouse loaded from CDC stream
Both work. The CDC version couples all consumers to the orders table schema; the outbox version gives you a stable OrderPlaced contract.
Operational cost comparison
Outbox:
- New table and index
- Publisher cron or scheduled task
- Monitoring for unpublished count
- No new infrastructure
CDC:
- Debezium + Kafka Connect cluster
- Schema registry
- Connector configuration
- WAL slot management
- Failover and replication interactions
- Separate monitoring for connector health
For small and medium systems, outbox wins on operational simplicity by a mile. For massive, multi-team systems where the investment is already paid, CDC offers better scalability.
My decision rule
- System throughput < 10k writes/sec and domain-event-shaped: outbox
- System throughput > 50k writes/sec: CDC seriously considered
- Need both domain events and analytics streaming: outbox for domain, CDC for analytics
- New system, small team: outbox, always
- Legacy system you can’t modify: CDC, reluctantly
Closing note
The outbox vs CDC choice is less about “which is better” and more about “which fits your constraints”. Outbox demands app-code changes but keeps operations simple and event design clean. CDC avoids app changes but adds infrastructure and ties consumers to your schema. Most successful systems I’ve seen start with outbox and add CDC only if and when analytics volume demands it. The instinct to reach for CDC because it sounds more sophisticated usually doesn’t pay back.