The Strangler Fig Pattern — Migrations That Actually Ship

Martin Fowler named the pattern after the strangler fig — a plant that wraps around an existing tree and gradually replaces it. Applied to software: incrementally replace parts of a legacy system with new implementations, with both running side by side, until the old can be removed.

It’s the most reliable migration strategy I’ve seen. Big-bang rewrites fail far more often. This article is the practical version.

Why not big-bang

Big-bang rewrites promise a clean new system in 6 months. They deliver a broken new system in 18, usually with:

Lost edge cases the old system handled quietly
Missing features that weren’t in the requirements
Performance regressions only visible in production
Users stuck with the old system because the new one isn’t ready

Strangler fig avoids all of this because the old system never goes away until it’s genuinely not needed.

The basic pattern

          ┌────────────────────────┐
          │     Traffic Router     │
          │   (gateway / facade)   │
          └────────────────────────┘
               │              │
               ▼              ▼
       ┌──────────────┐  ┌──────────────┐
       │   Legacy     │  │   New        │
       │   System     │  │   System     │
       └──────────────┘  └──────────────┘

Route specific paths, features, or users to the new system. Everything else stays with the legacy. Over time, the router shifts more traffic to new, less to legacy. When legacy traffic hits zero, remove it.

The three decision points per slice

1. What to extract first?

Pick a slice with:

Clear bounded context
Low coupling to the rest of the legacy system
Real pain today (slow, buggy, blocking new features)
Moderate complexity — not trivial (no learning), not massive (first slice should complete)

Classic good first slice: an authentication system, a notification service, a reporting dashboard.

2. How to route traffic?

Options by complexity:

URL routing. New system handles /api/v2/orders; legacy handles the rest. Simplest, least flexible.
Feature flag. Per-user or percentage-based routing. Maximum control, requires flag infrastructure.
Header-based. Canary clients send a special header. Used for A/B testing or beta access.
Data-based. Users migrated to new system by ID range or migration timestamp.

3. How to compare behavior?

Don’t trust that the new implementation matches legacy. Run both:

Dark traffic. Call both systems; use legacy’s response; log divergence from new
Dual write, dual read. Write to both, compare on read
Canary. Small % of users on new system, monitor error rates and support tickets

Catch behavior drift before cutting over.

The strangler gates

Don’t increase traffic to the new system until each gate is green:

Functional parity — all critical paths work
Performance parity — p99 within tolerance
Operational parity — monitoring, alerts, on-call runbooks
Security parity — auth, audit, compliance
Observability — logs, metrics, traces comparable to legacy

Skipping a gate is how teams end up with new systems worse than the ones they replaced.

A realistic timeline

For a non-trivial slice:

Week 1-2: new service scaffolded, CI/CD ready
Week 3-6: feature parity for the slice, dark traffic enabled
Week 7-8: divergence fixed, first 1% of traffic
Week 9-12: progressive rollout 1% → 10% → 50% → 100%
Week 13-14: sanity window with both running
Week 15-16: legacy code for this slice removed

~4 months per slice is realistic. Plan accordingly.

Common failure modes

The never-ending migration. Slices get added to, never removed from. Five years later, 80% of traffic still goes to legacy. Usually caused by missing sunset dates and political reluctance.

Parallel features. Product adds features to legacy during migration. New system never catches up. Lock down legacy to bug fixes only during the migration.

No measurement. Team “thinks” the new slice handles 20% of traffic. Actually 5%. Always instrument the router to report per-path, per-version usage.

Skipping dark traffic. First real-world request reveals 15 bugs. Dark traffic for 1-2 weeks before user-facing traffic flips catches most of them.

Shared database held as shared. Intentional data decoupling gets postponed. Two systems writing to the same tables = distributed monolith.

When strangler doesn’t fit

Systems with fundamentally different data models where translation is too costly
Regulatory environments where cutover must be atomic (rare but real)
Small systems where the migration itself is cheaper than the parallel infrastructure

Most systems bigger than “a few services” benefit from strangler over big-bang. The ones that don’t are usually small enough that rewrite-in-place is actually viable.

Closing note

Strangler fig feels slow compared to the imaginary quick rewrite. It ships. Rewrites often don’t. The best migration I’ve seen used strangler for 18 months, had zero production incidents caused by the migration, and delivered a clean new system that the team trusted. The worst migration I’ve seen chose big-bang, took 26 months, and the new system was abandoned 6 months after launch. The slow path is the fast path.

The Strangler Fig Pattern — Migrations That Actually Ship

Why not big-bang

The basic pattern

The three decision points per slice

The strangler gates

A realistic timeline

Common failure modes

When strangler doesn’t fit

Closing note

Related essays

Scaling a Java Backend — Production Lessons from a Real Migration

Monolith to Microservices — The Migration Path

Real Scalable Java Backend — Lessons from Practice