Kafka is three things layered on top of each other: a log, a pub/sub system, and a distributed storage engine. Partitions are where all three meet. Understanding partitions clearly saves more production headaches than any other Kafka concept.
Partitions are independent logs
A topic has one or more partitions. Each partition is an append-only, immutable, ordered log. Messages in one partition are strictly ordered. Across partitions — no ordering guarantee.
Topic: orders
├── partition 0: [m1][m4][m7][m10]...
├── partition 1: [m2][m5][m8]...
└── partition 2: [m3][m6][m9]...All the performance and scaling properties of Kafka come from this split. Multiple partitions = multiple writers in parallel, multiple consumers in parallel.
How messages get to partitions
Producer picks a partition per message. Three strategies:
Round-robin. No key → messages go to partitions in rotation. Load-balanced, no ordering guarantees whatsoever.
Hash-based (keyed). Producer sets a key; hash(key) % partitionCount picks the partition. All messages with the same key go to the same partition, so they’re ordered relative to each other.
Custom partitioner. You write code to decide. Rare; usually round-robin or keyed is fine.
The key choice is critical. Pick it so messages that need to be ordered together share a key.
Picking partition keys
Good keys:
orderId— all events for an order stay in ordercustomerId— all events for a customer stay in orderaccountId— all transactions for an account stay in order
Bad keys:
- Timestamp — all messages at the same second go to one partition; hot spots
- Random — no ordering guarantees (same as no key)
- Low-cardinality enums (e.g., country code) — with 20 keys and 10 partitions, only some partitions get load
The general rule: partition key is whatever entity’s history you need in order.
Consumers and partitions
A consumer group has one or more consumers. Partitions are distributed across them:
- 3 partitions, 1 consumer: that consumer handles all 3
- 3 partitions, 3 consumers: one each
- 3 partitions, 5 consumers: 3 get work, 2 are idle
This is how you scale consumption: add more partitions, add more consumers, up to consumers == partitions.
Critical: adding more consumers than partitions doesn’t help. Idle consumers are wasted capacity. Size partitions for peak throughput, not current.
Ordering guarantees — the nuance
In one partition: strict order.
Across partitions of a topic: no order. If order-1.placed goes to partition 0 and order-1.cancelled goes to partition 1 (different keys), consumers might process cancel before place.
Solution: use the same key for all messages about one entity. orderId means every event about an order goes to the same partition; consumers see them in emitted order.
Ordering within a partition is preserved when consumers process one message at a time per partition. Spawning threads inside a consumer to parallelize across a partition breaks ordering. Don’t.
Changing partition count
Dangerous. Partition count changes the hash distribution:
Before: hash("order-123") % 4 = 1
After (8 partitions): hash("order-123") % 8 = 3Now messages for order-123 are split across two partitions. Any ordering guarantee across the boundary is broken.
Options:
- Plan partition count at creation, live with it
- Create a new topic with the desired partitions, migrate consumers, switch producers
- Accept the split for a migration window, handle ordering explicitly (idempotent consumers help)
Always err high on partition count at creation. Too many is slightly wasteful; too few is a migration project.
Throughput math
Rough rules for sizing:
- Each partition supports ~10-100 MB/s (depends on hardware, replication, compression)
- Each consumer can sustain ~1-10k messages/sec for typical processing
- Target peak throughput ÷ per-consumer throughput ≤ partition count
Example: topic with 50k msg/sec peak, each consumer handles 5k msg/sec → need at least 10 partitions (and 10 consumers for full parallelism).
Common issues
Hot partitions. One key attracts most traffic (“vip-customer-42”). That partition’s consumer is overloaded; others idle. Fix: finer-grained keys, or a separate topic for outliers.
Rebalancing storms. Consumer group changes (add/remove consumer, or consumer crash) trigger rebalancing. All partitions pause while groups re-agree on ownership. Minimize churn; use cooperative rebalancing (Kafka 2.4+).
Consumer lag per partition. Some partitions fall behind because a specific consumer is slow. Monitor lag per partition, not just aggregate.
Too few partitions. Topic can’t scale consumers. Now you need a migration.
Too many partitions. Broker overhead balloons. Each partition has leader election, replication, file handles. Thousands of partitions per broker is too many.
Replication
Each partition has N replicas (typically 3). One is leader, others followers. Writes go to leader, replicate to followers. Reads default to leader.
min.insync.replicas determines when a write is acknowledged. Typical: 2 of 3. Gives durability (data survives one broker loss) without impossibly high write latency.
Closing note
Partitions are Kafka’s central concept. Most Kafka problems in production — ordering bugs, scaling limits, hot consumers, stuck rebalances — trace back to partition decisions made once, often without deep thought. Pick partition count and partition key carefully at topic creation. Use keyed messages whenever ordering matters. Never parallelize within a partition. Get those three right and Kafka tends to behave.