Event Sourcing in Practice — Lessons From Shipping It

The basics of event sourcing look tidy on a whiteboard. Running it in production teaches a different set of lessons. This article collects the ones worth knowing before the first production incident.

Lesson 1 — Pick aggregate boundaries carefully

An aggregate is the unit you replay. If you make aggregates too big, rehydration is slow and concurrent writes contend. Too small, and you lose the invariants a single aggregate enforces.

Good heuristic: the aggregate boundary is the smallest unit that enforces an invariant. An Account with balance rules is one aggregate. A Portfolio of accounts is not — it’s a query across aggregates, better handled as a projection.

Bad heuristic: one aggregate per domain entity. You’ll end up with User aggregates containing all the user’s lifetime data, thousands of events per aggregate, and painful replay times.

Lesson 2 — Snapshots are not optional

Rehydrating 50,000 events for every read kills performance. From day one, plan snapshot strategy:

Snapshot every N events (500 is a common starting point)
Store snapshots in a separate table or key-value store
On load: get latest snapshot + events after its version

What size N to pick? Measure. Snapshots have overhead (serialization, storage), so too-frequent wastes. Too-rare slows reads. Start at 500, tune based on production traces.

Lesson 3 — Schema evolution will bite you

Rule one: never modify a stored event’s payload. Rule two: plan how to evolve the schema from day one. You will:

Rename fields
Add required fields
Split or merge event types
Change enum values

Two main approaches:

Weak schema (JSON). Events are JSON blobs. Code tolerates unknown fields, applies defaults for missing ones. Easy to evolve; easy to silently break.

Versioned events. OrderPlacedV1, OrderPlacedV2. Old consumers handle V1. New consumers handle both. Eventually migrate all old events through an “upcaster”.

I’ve used both in production; the versioned approach has fewer surprises but more ceremony. Pick deliberately.

Lesson 4 — Replay from scratch happens more often than you think

You’ll rebuild projections. Maybe because of a bug, maybe a new reporting need, maybe schema changes. Design for it:

Projections should be idempotent (replaying the same events produces the same state)
Keep event store queries efficient — indexing, partitioning by time or aggregate
Track projection version; on mismatch, rebuild
Full rebuild should be feasible — if it takes a week, that’s a problem

For projections that can’t be fully rebuilt (e.g., ones that send email), make that explicit. Treat them as a separate class — “side-effect projections” — and handle them with care.

Lesson 5 — Event streams are eternal

You can’t delete events for legal or logical reasons (GDPR complications aside — deal with those via pseudonymization or crypto-shredding). This means:

Storage grows forever — archive cold data, don’t expect to compact
Old event types live forever — your code must keep parsing V1 events from 3 years ago
Breaking changes are expensive — there’s no “reset the DB”

Plan for 5-10 years of event history in any aggregate you build.

Lesson 6 — Consumers can’t be too smart

Tempting: put business logic in event handlers. “When OrderPlaced happens, also check inventory, also notify user, also update stats, and trigger fraud check if amount > $1000.”

Problem: the event has no context. It was emitted in a prior business operation; the consumer sees only the payload. If requirements change (“only notify users who opted in”), you change the consumer; if the event’s meaning drifted, you have a stealth breaking change.

Keep consumers dumb: project state, produce read models. Business decisions belong on the command side, driven by the current state of an aggregate.

“Right to be forgotten” meets “events are immutable”. Not fun.

Approaches:

Pseudonymization — store user identifiers separately; on deletion, break the link
Crypto-shredding — encrypt PII fields with a per-user key; on deletion, delete the key, PII becomes unreadable
Scrubbing — rewrite specific events to redact PII (breaks immutability in spirit but sometimes unavoidable)

Pick your strategy before you have regulator attention. Crypto-shredding is my preferred approach — it preserves the event stream structure while making deleted data unrecoverable.

Lesson 8 — Know when not to event-source

Event sourcing is most valuable for domains with rich transitions and audit needs. It’s overkill for:

Read-heavy catalog services
Simple reference data
Pass-through APIs that don’t own data
Any service where “what is the current state” is 99% of the access pattern

Within a larger system, it’s fine to event-source the parts that benefit (orders, payments, positions) and use normal CRUD for the rest (catalog, profiles, static data).

Tooling notes

Axon, EventStoreDB — full-stack event-sourcing platforms. Heavy but capable.
Kafka as event store — works but lacks proper aggregate querying; usually paired with a DB-backed event log
Roll your own on Postgres — events table with (aggregate_id, version, event_type, payload, at) is 90% of what you need
Outbox pattern — essential for coupling event publication with DB writes

When to walk away

If after six months you find yourself:

Constantly fighting snapshot/replay performance
Rebuilding projections every two weeks due to bugs
Team members dreading touching the event-sourced modules
Schema evolution consuming most of the roadmap

The pattern may not fit your domain. Migrating away is painful but recoverable — much harder than never having adopted it, easier than living with perpetual friction.

Closing note

Event sourcing, done right, is one of the most durable architectural choices — decades-old systems in banking and healthcare still thrive on it. Done badly, it’s a tar pit. The difference is in the ground-level hygiene: aggregate design, schema discipline, snapshot strategy, replay-aware projections. Get those right and it rewards you for years.

Event Sourcing in Practice — Lessons From Shipping It

Lesson 1 — Pick aggregate boundaries carefully

Lesson 2 — Snapshots are not optional

Lesson 3 — Schema evolution will bite you

Lesson 4 — Replay from scratch happens more often than you think

Lesson 5 — Event streams are eternal

Lesson 6 — Consumers can’t be too smart

Lesson 8 — Know when not to event-source

Tooling notes

When to walk away

Closing note

Related essays

Event Sourcing — The Basics

Real Cases — Microservices From the Field, with Code

Logging & Debugging Patterns — A Practical Guide

Lesson 1 — Pick aggregate boundaries carefully

Lesson 2 — Snapshots are not optional

Lesson 3 — Schema evolution will bite you

Lesson 4 — Replay from scratch happens more often than you think

Lesson 5 — Event streams are eternal

Lesson 6 — Consumers can’t be too smart

Lesson 7 — GDPR and event sourcing

Lesson 8 — Know when not to event-source

Tooling notes

When to walk away

Closing note

Related essays

Event Sourcing — The Basics

Real Cases — Microservices From the Field, with Code

Logging & Debugging Patterns — A Practical Guide