Or: how a routine scale-up exposed an architectural assumption I’d been making for years.
It was a Tuesday afternoon when our load test results came in, and I almost choked on my coffee.
We’d just scaled our real-time delivery service from one pod to three. A textbook horizontal scale-out. Standard Kubernetes deployment. Same code, three replicas. The kind of thing you do without thinking twice when concurrent connections start climbing.
Except now, in the load test report, 40% of synthetic clients weren’t receiving real-time events.
Forty. Percent.
The platform — a live betting system that pushes thousands of state updates per second to connected clients — had been running fine on a single pod for months. Adding two more pods was supposed to make it better, not break it. Yet here we were, looking at a graph where 6 out of 10 test users got perfectly timed updates, and 4 out of 10 got nothing.
I want to tell you what I learned in the next two days. Not because it’s a brilliant solution — honestly, the fix is well-known if you know to look for it. But because the path to that fix taught me something about a category of distributed systems bug I’d been blind to: the in-memory broker that quietly stops working when you scale out.
If you’ve ever pushed real-time data over WebSocket, this might save you a Tuesday afternoon.
What our system looked like
Here’s the simplified architecture before things went wrong:
flowchart TB
Client[React Client]
subgraph Pod ["Real-Time Service · 1 pod"]
Broker[Spring SimpleBroker<br/>in-memory routing]
KL[Kafka Listener]
end
K[(Kafka)]
Client <-->|WebSocket + STOMP| Broker
K --> KL
KL --> BrokerEvents flow from upstream services into Kafka. Our service consumes them and pushes notifications via WebSocket to connected clients using Spring’s STOMP support and SimpMessagingTemplate.
Two flavors of notification:
- Global broadcasts — game state changes everyone subscribed to a topic should see.
- User-specific events — balance updates, personal notifications, anything tied to one user.
For months, this just worked. One pod, in-memory broker, every client connected to the same JVM. Spring’s SimpleMessageBroker matched subscriptions and dispatched messages locally. Latency was excellent. Code was clean. No complaints.
Then we hit the scaling ceiling. More users meant more concurrent WebSocket sessions, and a single pod’s heap and event loop couldn’t keep up. Time to scale out.
I’ll be honest: I expected this to take 30 minutes. Bump replicas to 3 in our Helm chart, deploy, watch the dashboards, ship. I’d done this for stateless services dozens of times.
I forgot one thing: WebSocket sessions aren’t stateless.
The first deploy
We rolled out to staging with 3 pods behind our standard Layer 4 load balancer. Connected the load test rig: 1000 simulated users, each subscribing to a “round update” channel and a personal channel. Started a synthetic event generator that pushed Kafka events at production-realistic rates.
The first metric I checked was the obvious one: are messages being delivered?
Roughly. Sort of. About 40% of clients were missing them.
My first thought, embarrassingly, was “Kafka must be flaky in staging.” I checked the consumer-group lag. Zero. Events were being consumed promptly by one of the three pods (whichever held the relevant partition). I checked the WebSocket connection counts: 1000 connected, evenly split across pods, no disconnects. I checked Spring’s metrics for messages sent: pod-1 had pushed 333 frames; pod-2 had pushed 0; pod-3 had pushed 0.
That’s when it clicked, in the worst way clicks happen — not as a moment of insight, but as a slow dawning of oh no, of course.
The realization
When a Kafka event lands, exactly one pod’s consumer pulls it. That pod calls simpMessagingTemplate.convertAndSend("/topic/round/123", payload). Spring’s SimpleBroker looks up subscriptions for /topic/round/123…
In its local, in-memory map.
The other two pods? They have their own in-memory maps. Their own subscribers. Their own sessions. They have no idea this event happened, because the entire messaging fabric — SimpMessagingTemplate, SimpleBroker, the routing logic — lives entirely inside one JVM.
It’s not that messages were getting lost in a queue or dropped on the network. They were being delivered with 100% success — to the wrong third of the audience.
Here’s what was actually happening:
flowchart LR
K[(Kafka topic)]
K -->|partition routing| B2
subgraph P1 ["Pod 1"]
B1[local broker]
S1["sessions: A, B"]
end
subgraph P2 ["Pod 2 — receives the event"]
B2[local broker]
S2["sessions: C, D"]
end
subgraph P3 ["Pod 3"]
B3[local broker]
S3["sessions: E, F"]
end
B2 -->|delivers ✓| S2
B1 -.->|never sees the event ✗| S1
B3 -.->|never sees the event ✗| S3Two-thirds of users sat staring at stale UIs while one-third got correct updates. The math matched our 40%-ish missing rate (allowing for skew in session distribution).
I had built, accidentally, a system where horizontal scaling actively destroys correctness. The more pods I added, the worse the delivery rate would get. With 10 pods, only 10% of clients would receive each event.
What I tried first (and why it didn’t work)
I want to talk about the wrong turns I took, because honestly, they’re more instructive than the right answer.
Wrong turn #1: “Just use sticky sessions”
My first instinct was load-balancer-level pinning. Cookie-based affinity, IP hash — “glue each user to one pod, problem solved.” I even started drafting the LB config.
Then I sat with it for ten minutes and realized: sticky sessions don’t fix the actual problem.
If user-A is sticky to pod-1 and the round-update event lands on pod-2, pod-2 still doesn’t know about user-A. Sticky sessions only solve the connection routing — they don’t solve event routing. The event is generated in Kafka, not by the user; it has no idea which pod the user is on.
To make sticky sessions work, I’d need a registry that maps userId → pod, queried by every event producer, kept in sync as pods scale. I’d basically be building a distributed coordination system from scratch — and a half-working one, at that, because rolling deploys would dump everyone’s affinity at once.
I dropped the idea. It would have been a week of work and would have left me with a worse architecture.
Wrong turn #2: “Each pod gets its own Kafka consumer group”
My next thought: what if every pod consumes every event? Give each pod a unique consumer group ID. Now all three pods receive every Kafka message. Each pod tries to dispatch to its local subscribers. Whoever has the user delivers; whoever doesn’t, no-ops.
This is technically correct. It would have worked. I started prototyping.
Within an hour, I’d talked myself out of it. The math was ugly:
- Every event processed N times (once per pod)
- Database queries inside the listener: multiplied by N
- CPU cost: multiplied by N
- Wallet service calls (some listeners called downstream services): multiplied by N
Worse, this approach gets more expensive the more pods I add — exactly opposite of what I wanted from horizontal scaling. Three pods? 3× the work. Ten pods? 10× the work. We’d hit cost ceilings long before connection ceilings.
I needed a model where one pod processes the event once, and the result reaches the other pods cheaply.
The realization that re-framed the problem
I’d been thinking about this as a Kafka problem — how do I get the event to all the pods that need to deliver it?
It wasn’t a Kafka problem. It was a fan-out problem.
Kafka is great at work distribution. One event, one consumer, one piece of work done. That’s exactly what I don’t want here. I want one event, broadcast to all interested pods, with each pod handling its own local subscribers.
What I needed was a publish-subscribe primitive between pods — a layer that says “any pod can publish; all pods receive.” Specifically a fast, fire-and-forget pub-sub. I didn’t need durability (I’ll explain why in a moment). I didn’t need ordering across pods. I just needed cheap broadcast.
That’s exactly what Redis Pub/Sub is.
And we already had Redis in the stack.
The architecture I landed on
Here’s what I built:
flowchart TB
P1[Pod 1]
P2[Pod 2]
P3[Pod 3]
subgraph R ["Redis Pub/Sub"]
G(("channel: global"))
UA(("channel: user:A"))
UB(("channel: user:B"))
UN(("channel: user:N"))
end
P1 <--> G
P2 <--> G
P3 <--> G
P1 <--> UA
P2 <--> UB
P3 <--> UNTwo kinds of channels:
global — every pod subscribes at startup, forever. Used for broadcast events that any of our connected users might care about. Pod-2 publishes a round update; all three pods receive it; each pod’s local SimpleBroker dispatches to its own subscribers. The local-broker logic doesn’t change at all — it’s still doing what it always did. We just added an upstream layer that makes sure every pod gets every broadcast.
user:{id} — per-user channels, lazily subscribed. When user-A connects to pod-1, pod-1 subscribes to user:A. When user-A disconnects, pod-1 unsubscribes. If a backend service has a user-specific event for user-A, it publishes on user:A, and Redis routes it directly to pod-1 (and only pod-1). One pod gets the event, exactly the pod that has user-A’s WebSocket session.
The mental model became: Redis is the bus, local SimpleBroker is the last-mile delivery. Each layer does what it’s good at. Redis fans out cheaply across pods. SimpleBroker handles the in-process routing to actual WebSocket sessions, the way it always did.
The code (and the version I had to throw away)
My first implementation was uglier than the final one. I want to show both because the journey reveals why the final shape exists.
Version 1 (what I tried first)
I started by sprinkling Redis publishes throughout the existing codebase, replacing every simpMessagingTemplate.convertAndSend(...) with a Redis publish:
public void pushRoundUpdate(RoundEvent event) {
String json = serialize(event);
redisson.getTopic("global").publish(json);
}Every Kafka listener now published to Redis instead of calling Spring’s template directly. On the receiving side, each pod had a generic listener:
redisson.getTopic("global").addListener(String.class, (channel, payload) -> {
RoundEvent event = deserialize(payload);
simpMessagingTemplate.convertAndSend("/topic/round/" + event.gameId(), event);
});This worked. Sort of. But within a day I realized I had a problem: the receiving side needed to know what STOMP destination to send to. I’d hardcoded /topic/round/... in the Redis listener. That meant for every event type, I had a separate Redis listener mapping payload → STOMP destination. Five event types? Five listeners. Ten? Ten. The boilerplate exploded fast.
Worse, user-specific routing didn’t fit this model at all. convertAndSendToUser needs both a user ID and a destination, but my generic Redis payload was just the business object.
I needed to package the routing intent alongside the payload itself.
Version 2 (what shipped)
I introduced a wrapper type:
public sealed interface Destination permits GlobalDestination, UserDirectDestination {}
public record GlobalDestination(String topic) implements Destination {}
public record UserDirectDestination(String user, String topic) implements Destination {}
public record WSMessage(Destination destination, Object payload) {}The publish side became uniform — same code path for both event types:
public abstract class AbstractRedisPubSubMessageTemplate {
public void convertAndSend(String topic, Object payload) {
var msg = new WSMessage(new GlobalDestination(topic), payload);
redisson.getTopic(GLOBAL_CHANNEL).publishAsync(msg);
}
public void convertAndSendToUser(String user, String topic, Object payload) {
var msg = new WSMessage(new UserDirectDestination(user, topic), payload);
redisson.getTopic(USER_PREFIX + user).publishAsync(msg);
}
}The receive side became a single switch on destination type:
public void onMessage(CharSequence channel, WSMessage event) {
switch (event.destination()) {
case GlobalDestination g ->
simpMessagingTemplate.convertAndSend(g.topic(), event.payload());
case UserDirectDestination u ->
simpMessagingTemplate.convertAndSendToUser(u.user(), u.topic(), event.payload());
}
}One listener. One routing decision. The producer code calling this template looks identical to plain Spring code — convertAndSend and convertAndSendToUser have the same signature as Spring’s SimpMessagingTemplate. We could swap implementations without touching business logic.
That last point ended up mattering more than I expected, but I’ll come back to that.
A subtle pattern: lazy per-user subscriptions
Here’s a detail that’s easy to miss but affects scale dramatically.
Naively, you might think every pod should subscribe to every user:{id} channel — after all, any pod might need to send a user-specific message. But that’s wrong. A given user is connected to exactly one pod at a time. Only that pod needs to receive events for that user.
So we made user subscriptions lazy:
@EventListener
public void onWebSocketConnect(SessionConnectEvent event) {
String userId = extractUserId(event);
String channel = USER_PREFIX + userId;
redisson.getTopic(channel).addListener(WSMessage.class, this::onMessage);
}
@EventListener
public void onWebSocketDisconnect(SessionDisconnectEvent event) {
String userId = extractUserId(event);
String channel = USER_PREFIX + userId;
redisson.getTopic(channel).removeListener(/* ... */);
}The implication: if you have 50K connected users, you have 50K Redis pub-sub channels, each with exactly one subscriber (the pod where that user is connected). Inactive users have no channels. When a pod publishes user:A, Redis sees one subscriber and routes the message to one pod — efficient targeted delivery.
This matters at scale. Without lazy subscription, every pod would receive every user-specific event, then drop most of them (“not my user”). Pure waste.
What about durability?
Redis Pub/Sub is fire-and-forget. If a subscriber is briefly disconnected when a publish happens, the message is gone forever. For a financial transaction system, this would be a deal-breaker. For real-time UI updates, it’s actually fine — and that distinction is worth thinking about carefully.
Our model was: WebSocket is for incremental updates. REST is for ground truth.
When a client connects (or reconnects after a network blip), it makes a REST call to fetch the current state — current round, current balance, active bets. That snapshot is durable. WebSocket then streams deltas on top of it. If a delta is lost, the next snapshot rebuilds reality.
This means: if Redis fails over and we lose 5 seconds of pub-sub delivery, the user’s UI shows slightly stale data for 5 seconds. Then the next event arrives or the next REST refresh happens, and the world is consistent again. We don’t need durability in the messaging layer because we have durability somewhere else in the system.
I want to flag this because it’s the part where the architecture decision is made or unmade. If your real-time channel is the source of truth — if losing a message means losing money — Redis Pub/Sub is wrong. You need Redis Streams or Kafka with transactional semantics or RabbitMQ with persistence. We considered all of those. They were all overkill for a UI signaling channel.
The honest framing is: we chose to accept ~5 seconds of degraded UX in failover scenarios in exchange for sub-millisecond fan-out latency in normal operation. That’s the trade. Stating it explicitly made the team comfortable with it.
What surprised me
A few things I didn’t expect going in:
The publishing pod receives its own message. When pod-2 publishes to global, Redis sends the message to all subscribers — including pod-2 itself. At first this felt like a bug; my instinct was to filter it out (“don’t send to yourself”). But then I realized: pod-2 has its own local subscribers who need this event. Pod-2 needs to receive its own publish so its local SimpleBroker can dispatch to those clients. Self-receiving is a feature, not a bug. It made the logic uniform — every pod handles every message identically, no special case for “the pod that originated this.”
The change made rolling deploys safer, not harder. I’d worried that adding Redis as a dependency would create new failure modes during deploys. Instead, the opposite happened. Before this change, rolling deploys disconnected sticky-session candidates (we’d actually been running with a fragile sticky setup before that I haven’t told you about, because honestly I was embarrassed by it). Now, deploys just rotate pods, clients reconnect to whichever pod the LB picks, that pod subscribes to their user:{id} channel, and they’re back. No state migration. No drained connections. Pleasant.
Latency went down, not up. I expected adding a network hop through Redis to slow things down. End-to-end p99 actually dropped to under 100 ms because the upstream was no longer queuing on a single overloaded pod. Adding pods and a Redis hop turned out to be cheaper than serializing through one heap.
What I’d do differently
If I were starting over, I’d extract the WebSocket-via-Redis abstraction into a shared library on day one. We ended up doing this six months later, after the second service ran into the same scaling problem and re-implemented half of it. The interface — convertAndSend, convertAndSendToUser, with destination types as the wire contract — was reusable. We just hadn’t seen it as a library candidate when it was new.
I’d also write a runbook for “what does the user see when Redis fails over” before shipping. We had one customer-support ticket where someone reported a 6-second UI freeze that exactly correlated with a Redis maintenance window. The fix was already in place (REST snapshot on the next user action), but no one on the on-call team knew the failure mode by heart. A 200-word runbook would have saved an evening.
The take-home
If you’re running a WebSocket service on a single pod and thinking about scaling out, ask yourself one question before you do anything else:
Where is the routing state, and how does it get to the new pods?
If your messaging fabric lives in JVM heap — Spring’s SimpleBroker, an in-process EventBus, anything in-memory — scaling out will silently break correctness. Not crash. Not throw exceptions. Silently break. You’ll notice when 40% of users send angry tickets, not when a process dies.
The fix is conceptually simple: an external pub-sub layer that lets any pod publish, all pods (or just the right pod) receive. Redis Pub/Sub is a good default if you have Redis around. RabbitMQ fanout exchange works. Even a per-pod Kafka consumer group works if you really want to use what you have, though I’d think hard before going there.
But the more important lesson, for me, was about assumptions baked into single-pod architectures. The in-memory broker isn’t a Spring quirk — it’s a whole category of state. WebSocket sessions are state. Local caches are state. Anything that lives in heap is state. The day you go to two replicas, every piece of in-memory state is a question: does this need to be shared, partitioned, or duplicated?
In our case, the answer was “shared, via a fast fire-and-forget bus, with local last-mile delivery.” Three weeks of design, two weeks of implementation, ~250 lines of new code in the abstraction layer. The team scaled to N pods after that, no further architectural changes needed.
But I still think about that Tuesday afternoon, the load test report, and the slow dawning realization that the pleasant single-pod world I’d been living in had been an illusion all along. Distributed systems are full of these — the moments where an assumption you didn’t know you were making crashes into reality. The good news is they’re educational. The bad news is the education is usually billed to your weekend.
If you want broader background on push delivery patterns, I covered the architectural options earlier in Real-Time Notifications — that piece is the calm-weather version; this one is the storm.