krivaltsevich.com Kafka Internals4.4

III · 06 · Comparative Architecture & the Design Space

Source: Apache Kafka 4.4.0-SNAPSHOT (git 04bfe7d, 2026-06-15), KRaft mode. Architectural analysis grounded in the source-verified Part I and cited comparative sources.

Every architecture is a set of choices, and every choice forecloses others. The way to understand Kafka's design is to hold it against the systems that made the opposite bets and ask what each one bought and what it paid. Kafka's defining decision is that brokers co-own compute and storage: a broker is the node that accepts a write, holds the partitioned, replicated log on local disk, serves it back through the OS page cache with zero-copy, and runs the replication state machine, all in one process. That single decision is simultaneously the root of Kafka's throughput advantage and the root of its three most-cited weaknesses: cross-AZ replication cost, a real (if tunable) latency floor, and a partition ceiling. The competitors in this chapter attack that decision from four directions. Pulsar separates compute from storage. Redpanda keeps the co-located model but rewrites the runtime in C++ to lift the mechanical-sympathy ceiling. Kinesis and Pub/Sub hide the whole thing behind a managed meter. RabbitMQ rejects the log entirely for per-message broker bookkeeping. And WarpStream, and now upstream Kafka via KIP-1150, pushes the storage all the way out to object storage and makes the broker stateless. This chapter maps that design space honestly: for each system, the model, the storage substrate, the ordering guarantee, the one tradeoff that matters versus Kafka, and the reusable lesson. It closes with a synthesis of where Kafka sits and why, which is to say, the precise shape of the niche its bets carved out. The grounding rule is unchanged from the rest of Part III: every Kafka mechanism I assert is real and cross-linked to Part I; every comparative number is from the cited reference and treated as directional, because nearly all of them are vendor benchmarks.

Read every benchmark in this chapter as advocacy until proven otherwise

StreamNative, Redpanda, WarpStream, and AutoMQ all publish numbers that favour their own product, and the methodology traps are systematic, not accidental. Jack Vanlightly's independent re-run of Redpanda's own OpenMessaging benchmark on identical hardware reversed the headline: at NVMe saturation Kafka hit ~1,900 MB/s vs Redpanda's ~1,400; under TLS with 50 producers Redpanda's end-to-end latency blew out to 24 seconds while Kafka stayed sub-second; after 12 hours Redpanda's p99 reached 3.5 s while Kafka's improved (Vanlightly, empirical). The traps he documents recur everywhere: Kafka misconfigured with log.flush.interval.messages=1 (fsync per batch, which Kafka never does in production), Java 11 instead of 17 (which disproportionately hurts Kafka, especially with TLS), and inconsistent offset-commit cadence. Before you trust any cross-system claim below, check durability parity (fsync / journal / acks), JVM version, commit cadence, Coordinated-Omission correction, and test duration. The directional shape of each tradeoff is sound; the exact multipliers are not.

The axis that organises the whole space

Before the system-by-system tour, fix the coordinate system. Five nearly-orthogonal axes separate every system in this chapter, and almost every comparison reduces to a position on one or two of them. Naming them up front turns a pile of product names into a map.

AxisOne endOther endWhat it controls
Compute / storage couplingCo-located (Kafka, Redpanda)Separated (Pulsar, WarpStream, diskless)How fast you can scale, rebalance, and recover; whether storage cost decouples from broker cost.
Storage substrateLocal disk / page cache (Kafka, Redpanda)Object storage (WarpStream, diskless)The latency floor and the per-GB cost; whether cross-AZ replication exists at all.
Delivery modelReplayable log (Kafka, Pulsar, Kinesis)Per-message queue (RabbitMQ)Replay, fan-out, and retention vs per-message TTL, priority, and routing.
Operational ownershipSelf-managed (Kafka, Redpanda)Fully managed (Kinesis, Pub/Sub)Control and throughput-per-unit vs zero-ops elasticity and lock-in.
Push vs pullPull / poll (Kafka, Kinesis)Push (RabbitMQ, Pub/Sub)Low-load latency vs throughput batching; backpressure semantics.

Kafka's position is now nameable as a coordinate: co-located, local-disk, replayable-log, self-managed, pull. Every weakness in Inherent Limits falls out of that coordinate, and every competitor is a deliberate move along one axis, trading the advantage that axis-end conferred for the advantage of the other end. The decision tree below is the practical inverse: given your binding constraint, which axis-move does it push you toward?

You need durable, ordered, decoupled event streaming(if you don't, if you need point lookups, ad-hoc SQL, or synchronous request/reply, see When To Use a Log: the answer is "not a log at all")
Is per-message routing / priority / TTL the requirement?
Queue, not logRabbitMQ / classic MQ, broker tracks each message; you lose cheap replay & high throughput
Do you want to run the system yourself?
Spiky / low-volume, or steady high-throughput?
Kinesis / Pub/Subops-free elasticity; hard caps & lock-in
Managed gets expensive1 MB/s/shard cap; on-demand 2–3× pricier at scale, reconsider self-managed
Is cross-AZ cost or elastic scale the pain?
Separate storagediskless/object-store (WarpStream, KIP-1150) or Pulsar, pay ~+400 ms latency or a 2nd stateful system
Runtime ceilingRedpanda (C++/TPC), or just tune Kafka's OS layer first
Apache Kafkathe broad-spectrum choice: highest throughput-per-partition, replay, the largest ecosystem, accept cross-AZ cost & a tunable latency floor
The design space as a decision tree. Each leaf is a deliberate move off one of the five axes; the dashed path is the "no single constraint dominates" default, which is overwhelmingly where Kafka wins. The leaves are explored system-by-system below.
requirement / decision   co-located log (Kafka/Redpanda)   separated storage   managed / queue   constraint hit / reconsider   path   default fall-through

The master comparison

Here is the whole space in one table, model, storage, ordering, and the single tradeoff that matters versus Kafka. Read the rest of the chapter as the annotated expansion of these rows. Every performance number is sourced and directional.

SystemArchitecture modelStorage substrateOrdering / consistencyKey tradeoff vs Kafka · what it teaches
Apache Kafka Brokers co-own compute + storage; partitioned, replicated log; OS page cache + zero-copy sendfile Local disk (EBS/NVMe); tiered storage (KIP-405) offloads cold segments to S3 Per-partition total order only (never global); acks=all + ISR high-watermark commit , (the baseline) Highest throughput-per-partition + replay/storage; pays cross-AZ replication, a tunable latency floor, and a partition ceiling.
Apache Pulsar Stateless brokers; compute/storage separated Apache BookKeeper (distributed WAL across "bookies") + metadata store; BookKeeper uses its own caches, not the OS page cache Per-partition (+ ordering keys) Independent scaling & near-instant topic rebalance, at the cost of operating a second stateful system. Teaches: separation buys elasticity but doubles the operational surface.
Redpanda Single C++ binary, thread-per-core (Seastar), no JVM, Raft, own IO scheduling, Kafka-protocol compatible Local disk (own IO, not OS page-cache-reliant); tiering to object store available Per-partition Lower tail latency & fewer nodes in some regimes, at the cost of the JVM ecosystem. Vanlightly showed the 10×/3× claim reverses on equal hardware. Teaches: the mechanical-sympathy ceiling.
AWS Kinesis Fully managed; metered shard model Managed (24h–365d retention) Per-shard (by partition key) Zero-ops elasticity, at the cost of a hard 1 MB/s-per-shard cap, a per-Region shard quota (1,000–20,000 by Region, raisable), hot-shard skew, and lock-in. Teaches: managed simplicity trades control + throughput-per-unit.
Google Pub/Sub Fully managed, global, auto-scaling; no partitions/shards to manage; push or pull Managed No ordering by default (opt-in ordering keys); at-least-once Hands-off global elasticity, at the cost of ordering-by-default, partition-level control, and lock-in. Teaches: dropping the partition abstraction removes a ceiling and a guarantee.
RabbitMQ "Smart broker / dumb consumer"; exchange-based routing; push In-memory + disk queues; per-message broker state Per-queue FIFO; per-message priority breaks strict order Rich routing + per-message TTL/priority, but not a replayable high-throughput log; latency degrades past ~30 MB/s. Teaches: the queue-vs-log axis.
WarpStream / diskless (KIP-1150) Diskless / object-store-native; stateless agents, no local disk, no inter-broker replication, leaderless Object storage (S3 / S3 Express One Zone) only Per-partition (via a control plane / metadata layer) ~5–10× lower storage/cross-AZ cost, at the cost of ~400 ms produce p99 / ~1 s e2e. Teaches: the emerging object-storage frontier; latency-for-cost is a dial, not a verdict.

All cross-system performance claims are vendor benchmarks unless attributed to an independent source (Vanlightly). Treat as directional and check durability parity, JVM version, and test duration (reference Appendix D, empirical).

Apache Pulsar, separate the storage, double the system

Pulsar makes the single move that most directly contradicts Kafka's defining decision: it splits compute from storage. A Pulsar broker is stateless, it owns no log data on local disk. Durability lives in Apache BookKeeper, a separate cluster of "bookies" that store the log as a distributed write-ahead log; topic and ownership metadata live in a third tier (historically ZooKeeper). The contrast with Kafka's architecture is total: in Kafka, the broker that accepts your write is the broker that holds the data and the broker that serves it back (Part I Replication & the ISR, The Fetch Path); in Pulsar, those are three different processes on three different node pools.

PULSAR, compute tier (stateless brokers)
Own no data. A topic's ownership can move to any broker instantly because there is nothing to copy, the new owner just opens the BookKeeper ledger. This is the fast-rebalance / fast-recovery payoff.
APACHE BOOKKEEPER, storage tier (bookies)
Distributed WAL: each segment ("ledger") is striped across bookies; durability is BookKeeper's, independent of which broker is serving. Uses its own caches, not the OS page cache, so it cannot lean on Linux's zero-copy/sendfile path the way Kafka does.
METADATA STORE (ZooKeeper / etcd)
Topic, ledger, and ownership metadata. The third stateful system to operate.
Pulsar's three-tier separation, contrasted with Kafka's single co-located broker. The separation is the source of both its advantage (independent scaling, instant rebalance) and its cost (three systems to run).
compute / stateless broker   storage / BookKeeper   metadata   = a network hop between independently-scaled tiers
What separation actually buys

Two concrete wins fall directly out of stateless brokers. First, independent scaling: if you are storage-bound you add bookies; if you are throughput-bound you add brokers, you never over-provision one to get the other. Kafka couples them, so adding storage capacity means adding whole brokers (mitigated, not removed, by tiered storage, Part I Tiered Storage). Second, near-instant topic rebalance and recovery: when a Pulsar broker dies, its topics are reassigned to survivors immediately because the new owner copies nothing, it just opens the existing ledger. In Kafka, a dead broker's partitions are already replicated on followers (so failover is fast for leadership), but rebuilding the lost replica means re-replicating its data, and rebalancing data across brokers physically moves bytes. Pulsar sidesteps the byte-movement entirely. This is the reusable lesson: if the data isn't on the compute node, moving the compute is free.

The cost is a second stateful system, and a benchmark caveat

The separation that buys elasticity also doubles the operational surface: you now run brokers and BookKeeper and a metadata store, each with its own failure modes, tuning, and on-call burden. Kafka's KRaft migration (Part I KRaft Consensus, The KRaft Controller) spent years removing exactly one such auxiliary stateful system (ZooKeeper); Pulsar's design adds one back as a load-bearing tier. On raw numbers, StreamNative's own benchmark reports Pulsar beating Kafka (single-partition 700 MB/s journaled vs Kafka 280; 100-partition 1,600 vs 1,087), but the headline figures use a "no-journal" Pulsar configuration that weakens durability relative to Kafka's acks=all, it is not apples-to-apples on safety (StreamNative, advocacy). And BookKeeper's own-cache model means Pulsar forgoes the OS page-cache + zero-copy fetch path that makes Kafka's catch-up reads so cheap. Pick Pulsar when independent scaling and instant rebalance are worth a second cluster to babysit; otherwise the separation is overhead you pay every day to buy elasticity you may use rarely.

Redpanda, the mechanical-sympathy ceiling

Redpanda accepts Kafka's co-located model, broker owns compute and storage, data on local disk, and attacks a different axis: the runtime. It is a single C++ binary built on the Seastar thread-per-core framework, with no JVM, no garbage collector, and its own asynchronous IO scheduling that deliberately does not rely on the OS page cache. It speaks the Kafka wire protocol (Part I The Wire Protocol), so existing clients connect unchanged. The thesis is pure mechanical sympathy: pin one thread to each core, shard all state by core so there is no cross-core locking, manage memory and IO explicitly, and you eliminate the two things that produce Kafka's tail-latency jitter, GC pauses and page-cache unpredictability.

The lesson is the ceiling, not the product

Redpanda is the cleanest demonstration in this space of a mechanical-sympathy ceiling: the maximum performance the hardware allows once you remove every software-layer tax. Kafka's JVM heap must stay small (~6 GB) precisely so the rest of RAM is page cache and GC pauses stay short, a 32 GB heap can incur 100–200 ms G1GC pauses, and a multi-second Full GC drops replicas from the ISR (Part I Replication & the ISR; Failure Modes, empirical). A thread-per-core C++ runtime has no GC and no shared page cache to thrash, so in principle it can run closer to the NIC/NVMe ceiling with a tighter tail. The transferable tactic, usable in any latency-critical system, is to identify where your runtime taxes you (GC, lock contention, cache non-determinism, syscall overhead) and ask what the hardware ceiling would be without it. That framing is valuable even if you never adopt Redpanda.

But the headline does not survive an independent re-run

Redpanda markets "10× faster tail latencies with up to 3× fewer nodes." Vanlightly re-ran Redpanda's own OpenMessaging benchmark on identical 3× i3en.6xlarge hardware and found the claims "greatly exaggerated" and "not generalizable": at 50 producers / 500 MB/s Redpanda topped out at 330 MB/s while Kafka hit the target; at NVMe saturation Kafka reached ~1,900 MB/s vs Redpanda's ~1,400; under TLS with 50 producers Redpanda's e2e latency hit 24 seconds against sub-second Kafka; and over a 12-hour run Redpanda's p99 climbed to 3.5 s (p99.99 to 26 s) while Kafka improved (Vanlightly, empirical). The original Redpanda benchmark had Kafka misconfigured with log.flush.interval.messages=1 (per-batch fsync, which Kafka never does in production) on Java 11. Crucially, Redpanda's framing that "Kafka doesn't fsync, so it's unsafe" is false: Kafka deliberately relies on replication plus log recovery rather than mandatory per-write fsync (Part I Replication & the ISR, Durability Engineering), a considered latency/durability tradeoff, not an oversight. The real Redpanda value proposition is operational (one binary, no JVM, no separate metadata store) more than it is a universal latency win; and much of Kafka's tail is filesystem/OS-level, recoverable without changing runtimes, Allegro cut producer-latency outliers over a 65 ms SLO by 82% simply by moving brokers to XFS (Allegro/InfoQ, empirical).

Kinesis & Pub/Sub, trading control for a managed meter

The managed services move off the operational-ownership axis: AWS or Google runs the brokers, the storage, the scaling, and the patching, and you interact with a metered API. The two exemplars make opposite sub-choices about the partition abstraction, which is the most instructive thing about them.

Kinesis, the shard meter

Kinesis keeps a partition-like primitive, the shard, but turns it into a billing unit with a hard capacity cap: each shard sustains exactly 1 MB/s (1,000 records/s) write and 2 MB/s read, and the per-account shard count is itself a per-Region quota (1,000–20,000 depending on Region, raisable) (AWS Kinesis quotas). Compare a single Kafka partition on NVMe, which absorbs 10–100+ MB/s (Part I Storage & the Log Engine; Partitioning). The architectural difference is stark: Kafka's partition throughput is bounded by hardware and tuning; Kinesis's is bounded by a price list. The flip side is genuine: you provision throughput by adding shards through an API call and never think about brokers, disks, page cache, or rebalances. Where this number sits in the limits taxonomy, a hard service quota versus Kafka's emergent per-partition rate, is detailed in II · 02 · Limits & Boundaries.

Pub/Sub, drop the partition entirely

Pub/Sub goes further and removes the partition abstraction altogether: there are no shards to provision, the service auto-scales globally, and it is push-or-pull. This eliminates the partition ceiling and the hot-partition problem in one move, but it also eliminates ordering by default. Where Kafka gives strict per-partition total order as a baseline guarantee (Part I The Fetch Path), Pub/Sub is at-least-once with unordered delivery unless you opt into ordering keys. This is the cleanest illustration of a deep tradeoff: the partition is simultaneously Kafka's scaling ceiling and its ordering guarantee. Drop the partition and you lose the ceiling and the order together; they are the same abstraction viewed from two sides.

Producer Managed service Consumer
PutRecord (Kinesis), capped at 1 MB/s × shard
ProvisionedThroughputExceeded if over cap
Publish (Pub/Sub), no shard, auto-scales, unordered by default
push (Pub/Sub) or pull (both)
Two managed models. Kinesis exposes a capped shard meter (back-pressure is a thrown error, not a queue); Pub/Sub hides the partition entirely and pushes, trading the ceiling for loss of default ordering.
your client   managed (AWS/Google-operated)   data   error / control   cap = a hard, price-list-defined limit
The economics flip with scale, and lock-in is the silent cost

Managed services are genuinely cheaper for spiky or low-volume workloads: you pay per shard-hour or per message and run zero infrastructure. But the curve inverts at sustained high throughput. Kinesis on-demand runs roughly 2–3× more expensive than self-managed Kafka at steady scale, the 1 MB/s shard cap plus the per-Region shard quota become real ceilings, and hot-shard skew (the managed analogue of Kafka's hot-partition problem, Failure Modes) still bites because the partition-key hash still concentrates a hot key (reference, empirical). The deeper, less-visible cost is lock-in: Kinesis and Pub/Sub are proprietary APIs, so unlike the Kafka protocol, which Redpanda, WarpStream, and others reimplement, there is no second source. You trade the entire operational burden for a metered ceiling and a one-vendor dependency. The honest rule: managed wins decisively below the throughput where the meter overtakes a 3-broker cluster, and loses above it; know which side of that line your workload is on before you commit.

RabbitMQ, the queue-versus-log axis

RabbitMQ is the one system here that is not a log at all, which makes it the most clarifying comparison. It is a "smart broker / dumb consumer": the broker tracks per-message delivery state, routes messages through exchanges (direct, topic, fanout, headers), and supports per-message priority, per-message and per-queue TTL, and rich content-based routing. Kafka is the mirror image, a "dumb broker / smart consumer": the broker appends bytes to a partition and tracks nothing per message; the consumer owns its offset and decides what it has processed (Part I Group Coordination, The Consumer Client). This is not a feature gap; it is the fundamental fork in the road, and each branch is right for opposite requirements.

DimensionRabbitMQ (queue)Kafka (log)Why the difference is structural
Per-message stateBroker tracks ack/redelivery per messageBroker tracks none; consumer owns the offsetPer-message bookkeeping is the cost that enables routing/priority and the reason throughput is lower.
ReplayConsumed message is gone (or dead-lettered)Re-read from any offset; retention-bounded, not consumption-boundedThe log is an immutable record; the queue is a hand-off. Replay is free in one, impossible in the other.
RoutingExchanges: direct / topic / fanout / headers; content-basedKey-hash to a partition onlySmart-broker routing vs dumb-broker append. Kafka pushes routing logic to consumers/Streams.
Per-message TTL / priorityYes, bothNo, topic-level/time-based retention only; no priorityPer-message semantics require per-message state, which Kafka deliberately refuses.
OrderingPer-queue FIFO; priority breaks strict orderStrict per-partition total order, alwaysPriority and strict order are mutually exclusive; Kafka chose order, RabbitMQ chose priority.
Throughput / latency~1 ms at low load; degrades past ~30 MB/s605 MB/s peak; p99 ~5 ms at 200 MB/s (reference, empirical)Per-message state & push delivery cap throughput; append + zero-copy + pull scales an order of magnitude higher.
Tactic: choose the axis before the product

The queue-vs-log decision should be made on requirements, not familiarity. Reach for a queue (RabbitMQ, SQS, classic MQ) when you need per-message routing, priority, or TTL; when each message is a task consumed once and discarded; when fan-out is modest and replay is not a requirement. Reach for a log (Kafka) when you need durable ordered history, high throughput, broadcast fan-out to many independent consumer groups, or replay/reprocessing. The most expensive mistake in this whole chapter is forcing one onto the other's job: Kafka as a task queue collapses to head-of-line blocking because per-partition order plus one-consumer-per-partition means "partition throughput collapses to the speed of the slowest message" and a poison pill stalls every later message (reference, anti-pattern; mitigated but not removed by more partitions or the Confluent Parallel Consumer, see When To Use a Log and Share Groups, which adds queue-like per-message acknowledgement to Kafka via KIP-932). RabbitMQ as a replayable event store fails just as hard, because consumed messages are gone. The product follows from the axis; pick the axis first.

WarpStream & diskless, the object-storage frontier

The newest move pushes Kafka's storage all the way out: diskless architectures put the log directly in object storage (S3) and make the broker stateless, no local disk, no leader, no inter-broker replication. WarpStream pioneered this commercially; upstream Apache Kafka is bringing it in via KIP-1150 (Diskless Topics), accepted around March 2026. The economic argument is the sharpest in this chapter, and it attacks Kafka's single largest cloud cost head-on.

CLASSIC KAFKA, co-located, the cost it generates
Producer → leader (often cross-AZ) → replicate to RF−1 followers (cross-AZ, billed per GB each way) → store RF copies on local SSD. Cross-AZ replication is ~70–90% of the cloud bill at high throughput; triple-replicated local SSD is ~10–20× the per-GiB cost of S3 (reference, empirical).
↕ the diskless move
DISKLESS, stateless agents write straight to S3
Producer → any agent → batch written directly to object storage; no leader, no inter-broker replication, no local disk. S3 is internally multi-AZ-durable, so the RF−1 cross-AZ replication term disappears entirely and storage drops to ~$0.021/GiB. Claims ~5–10× TCO reduction (WarpStream, advocacy).
↕ the price you pay
THE LATENCY COST
Object-store round-trips replace local-disk + page-cache writes: ~400 ms produce p99, ~1 s producer-to-consumer e2e p99 for pure-S3. AutoMQ's WAL-assisted variant (a tiny ~10 GB EBS WAL in front of S3) recovers single-digit-ms latency while keeping the cost win.
The diskless move eliminates Kafka's largest cost driver, cross-AZ replication, by replacing co-located local-disk storage with object storage. The bill collapses; the latency floor rises by two-to-three orders of magnitude unless a small write-ahead buffer is reintroduced.
classic co-located Kafka   diskless / object-store   the latency cost incurred   RF−1 = the cross-AZ replication term that vanishes
Why this is the most important emerging tradeoff

Recall from Cost Engineering that Kafka's bill is dominated by cross-AZ network, not compute: a 3-AZ RF=3 cluster copies every ingested GB RF−1 times across zone boundaries that AWS meters at ~$0.02/GB effective, and Confluent's own teardown puts networking at 87% of a 100 MBps cluster's cost (Confluent, empirical). Tiered storage (Part I Tiered Storage) cuts the storage term but, a common misconception, does nothing for cross-AZ networking. Fetch-from-follower (KIP-392) cuts the consumer-read term but leaves the produce + replication floor. Diskless is the only move that removes the replication term outright, because object storage is internally multi-AZ-durable, so there is no inter-broker copy to bill. That is why it is the frontier: it is the first architecture that attacks the cost driver every other Kafka optimisation leaves standing. The transferable insight: when your dominant cost is data movement between failure domains, the highest-leverage move is to delegate the failure-domain durability to a substrate that already spans them (S3) rather than re-implementing it yourself (RF replication).

The honest caveats: latency, maturity, and "diskless ≠ free"

Three sobering qualifications. First, latency is the real tax: pure-S3 diskless swaps a low single-digit-ms floor for ~400 ms produce p99 and ~1 s e2e, fine for log aggregation, ETL, and batch-ish streaming, disqualifying for anything sub-100 ms. It is designed to coexist with classic low-latency topics, not replace them. (AutoMQ's small-EBS-WAL hybrid recovers single-digit-ms but reintroduces a sliver of local state, so it is not purely diskless.) Second, upstream maturity: WarpStream and AutoMQ are mature products, but in Apache Kafka itself diskless is in-flight, KIP-1150 was accepted ~March 2026, and acceptance is not production-ready OSS; the design fragmented into multiple revisions plus a competing KIP-1176 (Vanlightly, empirical). Do not present upstream diskless as shipped or GA. Third, the vendor TCO percentages (WarpStream 80–85%, AutoMQ ~100% of cross-AZ eliminated) use favourable assumptions and are directional, not audited, and they are AWS/GCP-centric, because Azure has historically not charged for inter-AZ traffic at all, which guts the entire cost argument there. Diskless is a genuine frontier with a genuine, large win on the right workload; it is not a universal upgrade.

Where Kafka sits, and why, the synthesis

Lay the seven systems on the axes and Kafka's position resolves into something precise: it is the broad-spectrum generalist of the durable-streaming space. It is not the throughput champion of any single dimension, Pulsar can scale storage independently, Redpanda can run a tighter tail in some regimes, Kinesis needs zero operators, RabbitMQ routes per message, diskless is far cheaper at rest. Kafka wins by being good enough on every axis at once while being best on the combination that matters most for event streaming: throughput-per-partition, replayable durable storage, strict per-partition ordering, and the largest ecosystem on Earth (Streams, Connect, the entire compatible-protocol cohort, Part I Kafka Streams, Kafka Connect). The reason a generalist wins the default is that most real systems have no single dominating constraint: they want decent latency and high throughput and replay and ordering and a hiring pool, and Kafka is the Pareto-efficient point that delivers all five without forcing a specialist's sacrifice.

decouple from ZooKeeper co-located log
(the bet)
storage cost? tiered storage
(mitigate)
consumer cross-AZ? fetch-from-follower
(mitigate)
replication cross-AZ? diskless / KIP-1150
(absorb the frontier)
Kafka is not standing still on its weakest axis. The evolution timeline shows the project absorbing its competitors' best ideas one cost driver at a time, KRaft removed the auxiliary system Pulsar still needs; tiered storage and fetch-from-follower chip at cost; diskless co-opts the object-storage frontier. Each step narrows a gap a specialist opened. See Evolution.
state = an architectural era; transition = the constraint that forced the next move. The arc is "co-located bet → mitigate its costs → absorb the frontier that attacked them."

This synthesis also explains the single most important strategic fact about the comparison: the alternatives' best ideas keep migrating back into Kafka. KRaft (KRaft Consensus) removed the very class of auxiliary stateful system that Pulsar's design still mandates. Tiered storage (Tiered Storage) borrowed the object-store-for-cold-data idea. The KIP-848 rebalance protocol (GA in Kafka 4.0, reported up to ~20× faster) closed the historical "stop-the-world rebalance" critique that was largely obsolete already. Share Groups (Share Groups, KIP-932) graft queue-like per-message acknowledgement onto the log, addressing the task-queue gap that sent users to RabbitMQ. And diskless (KIP-1150) is co-opting the object-storage frontier WarpStream opened. Because the Kafka protocol is an open standard with multiple implementations, the ecosystem behaves as a gravity well: a good idea that proves itself in a competitor tends to be re-implemented behind the same wire protocol, so the cost of betting on Kafka is hedged by the near-certainty that its worst current weakness is someone's active roadmap item.

The architect's takeaway: pick the specialist only when one constraint truly dominates

The decision rule that falls out of the whole chapter is disciplined and unsentimental. Default to Kafka unless a single constraint dominates so hard that a specialist's sacrifice is worth it, and verify the dominance with a number, not a vibe. Choose Pulsar only if independent compute/storage scaling and instant rebalance are worth running a second stateful system every day. Choose Redpanda only if you have measured a tail-latency or node-count problem that survives OS-level tuning (XFS, page-cache sizing, GC config), and remember Vanlightly showed the headline reverses on equal hardware. Choose Kinesis/Pub-Sub only if your volume is genuinely spiky or low and zero-ops outweighs a metered ceiling and lock-in. Choose RabbitMQ only if you are on the queue side of the axis, per-message routing/priority/TTL, task semantics, no replay. Choose diskless only if cross-AZ cost is your dominant pain and you can absorb ~400 ms+ produce latency (and you are on AWS/GCP, not Azure). In every other case, which is most cases, the generalist's even competence across all five axes, plus the largest ecosystem and the strongest "your weakness is on the roadmap" hedge, makes Kafka the rational default. The skill this chapter builds is not memorising seven products; it is the habit of naming your binding constraint, mapping it to an axis, and only then reaching for the system that moved off that axis. Carry the cheat-sheet forward to The Architect's Cheat-Sheet.

krivaltsevich.com · Part of Apache Kafka Internals · derived from Apache Kafka 4.4 source · GitHub · MIT-licensed.

Apache Kafka® is a registered trademark of the Apache Software Foundation. This is an independent, unofficial guide, not affiliated with or endorsed by the ASF.