III · 06 · Comparative Architecture & the Design Space
Source: Apache Kafka 4.4.0-SNAPSHOT (git 04bfe7d, 2026-06-15), KRaft mode. Architectural analysis grounded in the source-verified Part I and cited comparative sources.
Every architecture is a set of choices, and every choice forecloses others. The way to understand Kafka's design is to hold it against the systems that made the opposite bets and ask what each one bought and what it paid. Kafka's defining decision is that brokers co-own compute and storage: a broker is the node that accepts a write, holds the partitioned, replicated log on local disk, serves it back through the OS page cache with zero-copy, and runs the replication state machine, all in one process. That single decision is simultaneously the root of Kafka's throughput advantage and the root of its three most-cited weaknesses: cross-AZ replication cost, a real (if tunable) latency floor, and a partition ceiling. The competitors in this chapter attack that decision from four directions. Pulsar separates compute from storage. Redpanda keeps the co-located model but rewrites the runtime in C++ to lift the mechanical-sympathy ceiling. Kinesis and Pub/Sub hide the whole thing behind a managed meter. RabbitMQ rejects the log entirely for per-message broker bookkeeping. And WarpStream, and now upstream Kafka via KIP-1150, pushes the storage all the way out to object storage and makes the broker stateless. This chapter maps that design space honestly: for each system, the model, the storage substrate, the ordering guarantee, the one tradeoff that matters versus Kafka, and the reusable lesson. It closes with a synthesis of where Kafka sits and why, which is to say, the precise shape of the niche its bets carved out. The grounding rule is unchanged from the rest of Part III: every Kafka mechanism I assert is real and cross-linked to Part I; every comparative number is from the cited reference and treated as directional, because nearly all of them are vendor benchmarks.
StreamNative, Redpanda, WarpStream, and AutoMQ all publish numbers that favour their own product, and the methodology traps are systematic, not accidental. Jack Vanlightly's independent re-run of Redpanda's own OpenMessaging benchmark on identical hardware reversed the headline: at NVMe saturation Kafka hit ~1,900 MB/s vs Redpanda's ~1,400; under TLS with 50 producers Redpanda's end-to-end latency blew out to 24 seconds while Kafka stayed sub-second; after 12 hours Redpanda's p99 reached 3.5 s while Kafka's improved (Vanlightly, empirical). The traps he documents recur everywhere: Kafka misconfigured with log.flush.interval.messages=1 (fsync per batch, which Kafka never does in production), Java 11 instead of 17 (which disproportionately hurts Kafka, especially with TLS), and inconsistent offset-commit cadence. Before you trust any cross-system claim below, check durability parity (fsync / journal / acks), JVM version, commit cadence, Coordinated-Omission correction, and test duration. The directional shape of each tradeoff is sound; the exact multipliers are not.
The axis that organises the whole space
Before the system-by-system tour, fix the coordinate system. Five nearly-orthogonal axes separate every system in this chapter, and almost every comparison reduces to a position on one or two of them. Naming them up front turns a pile of product names into a map.
| Axis | One end | Other end | What it controls |
|---|---|---|---|
| Compute / storage coupling | Co-located (Kafka, Redpanda) | Separated (Pulsar, WarpStream, diskless) | How fast you can scale, rebalance, and recover; whether storage cost decouples from broker cost. |
| Storage substrate | Local disk / page cache (Kafka, Redpanda) | Object storage (WarpStream, diskless) | The latency floor and the per-GB cost; whether cross-AZ replication exists at all. |
| Delivery model | Replayable log (Kafka, Pulsar, Kinesis) | Per-message queue (RabbitMQ) | Replay, fan-out, and retention vs per-message TTL, priority, and routing. |
| Operational ownership | Self-managed (Kafka, Redpanda) | Fully managed (Kinesis, Pub/Sub) | Control and throughput-per-unit vs zero-ops elasticity and lock-in. |
| Push vs pull | Pull / poll (Kafka, Kinesis) | Push (RabbitMQ, Pub/Sub) | Low-load latency vs throughput batching; backpressure semantics. |
Kafka's position is now nameable as a coordinate: co-located, local-disk, replayable-log, self-managed, pull. Every weakness in Inherent Limits falls out of that coordinate, and every competitor is a deliberate move along one axis, trading the advantage that axis-end conferred for the advantage of the other end. The decision tree below is the practical inverse: given your binding constraint, which axis-move does it push you toward?
The master comparison
Here is the whole space in one table, model, storage, ordering, and the single tradeoff that matters versus Kafka. Read the rest of the chapter as the annotated expansion of these rows. Every performance number is sourced and directional.
| System | Architecture model | Storage substrate | Ordering / consistency | Key tradeoff vs Kafka · what it teaches |
|---|---|---|---|---|
| Apache Kafka | Brokers co-own compute + storage; partitioned, replicated log; OS page cache + zero-copy sendfile |
Local disk (EBS/NVMe); tiered storage (KIP-405) offloads cold segments to S3 | Per-partition total order only (never global); acks=all + ISR high-watermark commit |
, (the baseline) Highest throughput-per-partition + replay/storage; pays cross-AZ replication, a tunable latency floor, and a partition ceiling. |
| Apache Pulsar | Stateless brokers; compute/storage separated | Apache BookKeeper (distributed WAL across "bookies") + metadata store; BookKeeper uses its own caches, not the OS page cache | Per-partition (+ ordering keys) | Independent scaling & near-instant topic rebalance, at the cost of operating a second stateful system. Teaches: separation buys elasticity but doubles the operational surface. |
| Redpanda | Single C++ binary, thread-per-core (Seastar), no JVM, Raft, own IO scheduling, Kafka-protocol compatible | Local disk (own IO, not OS page-cache-reliant); tiering to object store available | Per-partition | Lower tail latency & fewer nodes in some regimes, at the cost of the JVM ecosystem. Vanlightly showed the 10×/3× claim reverses on equal hardware. Teaches: the mechanical-sympathy ceiling. |
| AWS Kinesis | Fully managed; metered shard model | Managed (24h–365d retention) | Per-shard (by partition key) | Zero-ops elasticity, at the cost of a hard 1 MB/s-per-shard cap, a per-Region shard quota (1,000–20,000 by Region, raisable), hot-shard skew, and lock-in. Teaches: managed simplicity trades control + throughput-per-unit. |
| Google Pub/Sub | Fully managed, global, auto-scaling; no partitions/shards to manage; push or pull | Managed | No ordering by default (opt-in ordering keys); at-least-once | Hands-off global elasticity, at the cost of ordering-by-default, partition-level control, and lock-in. Teaches: dropping the partition abstraction removes a ceiling and a guarantee. |
| RabbitMQ | "Smart broker / dumb consumer"; exchange-based routing; push | In-memory + disk queues; per-message broker state | Per-queue FIFO; per-message priority breaks strict order | Rich routing + per-message TTL/priority, but not a replayable high-throughput log; latency degrades past ~30 MB/s. Teaches: the queue-vs-log axis. |
| WarpStream / diskless (KIP-1150) | Diskless / object-store-native; stateless agents, no local disk, no inter-broker replication, leaderless | Object storage (S3 / S3 Express One Zone) only | Per-partition (via a control plane / metadata layer) | ~5–10× lower storage/cross-AZ cost, at the cost of ~400 ms produce p99 / ~1 s e2e. Teaches: the emerging object-storage frontier; latency-for-cost is a dial, not a verdict. |
All cross-system performance claims are vendor benchmarks unless attributed to an independent source (Vanlightly). Treat as directional and check durability parity, JVM version, and test duration (reference Appendix D, empirical).
Apache Pulsar, separate the storage, double the system
Pulsar makes the single move that most directly contradicts Kafka's defining decision: it splits compute from storage. A Pulsar broker is stateless, it owns no log data on local disk. Durability lives in Apache BookKeeper, a separate cluster of "bookies" that store the log as a distributed write-ahead log; topic and ownership metadata live in a third tier (historically ZooKeeper). The contrast with Kafka's architecture is total: in Kafka, the broker that accepts your write is the broker that holds the data and the broker that serves it back (Part I Replication & the ISR, The Fetch Path); in Pulsar, those are three different processes on three different node pools.
sendfile path the way Kafka does.Two concrete wins fall directly out of stateless brokers. First, independent scaling: if you are storage-bound you add bookies; if you are throughput-bound you add brokers, you never over-provision one to get the other. Kafka couples them, so adding storage capacity means adding whole brokers (mitigated, not removed, by tiered storage, Part I Tiered Storage). Second, near-instant topic rebalance and recovery: when a Pulsar broker dies, its topics are reassigned to survivors immediately because the new owner copies nothing, it just opens the existing ledger. In Kafka, a dead broker's partitions are already replicated on followers (so failover is fast for leadership), but rebuilding the lost replica means re-replicating its data, and rebalancing data across brokers physically moves bytes. Pulsar sidesteps the byte-movement entirely. This is the reusable lesson: if the data isn't on the compute node, moving the compute is free.
The separation that buys elasticity also doubles the operational surface: you now run brokers and BookKeeper and a metadata store, each with its own failure modes, tuning, and on-call burden. Kafka's KRaft migration (Part I KRaft Consensus, The KRaft Controller) spent years removing exactly one such auxiliary stateful system (ZooKeeper); Pulsar's design adds one back as a load-bearing tier. On raw numbers, StreamNative's own benchmark reports Pulsar beating Kafka (single-partition 700 MB/s journaled vs Kafka 280; 100-partition 1,600 vs 1,087), but the headline figures use a "no-journal" Pulsar configuration that weakens durability relative to Kafka's acks=all, it is not apples-to-apples on safety (StreamNative, advocacy). And BookKeeper's own-cache model means Pulsar forgoes the OS page-cache + zero-copy fetch path that makes Kafka's catch-up reads so cheap. Pick Pulsar when independent scaling and instant rebalance are worth a second cluster to babysit; otherwise the separation is overhead you pay every day to buy elasticity you may use rarely.
Redpanda, the mechanical-sympathy ceiling
Redpanda accepts Kafka's co-located model, broker owns compute and storage, data on local disk, and attacks a different axis: the runtime. It is a single C++ binary built on the Seastar thread-per-core framework, with no JVM, no garbage collector, and its own asynchronous IO scheduling that deliberately does not rely on the OS page cache. It speaks the Kafka wire protocol (Part I The Wire Protocol), so existing clients connect unchanged. The thesis is pure mechanical sympathy: pin one thread to each core, shard all state by core so there is no cross-core locking, manage memory and IO explicitly, and you eliminate the two things that produce Kafka's tail-latency jitter, GC pauses and page-cache unpredictability.
Redpanda is the cleanest demonstration in this space of a mechanical-sympathy ceiling: the maximum performance the hardware allows once you remove every software-layer tax. Kafka's JVM heap must stay small (~6 GB) precisely so the rest of RAM is page cache and GC pauses stay short, a 32 GB heap can incur 100–200 ms G1GC pauses, and a multi-second Full GC drops replicas from the ISR (Part I Replication & the ISR; Failure Modes, empirical). A thread-per-core C++ runtime has no GC and no shared page cache to thrash, so in principle it can run closer to the NIC/NVMe ceiling with a tighter tail. The transferable tactic, usable in any latency-critical system, is to identify where your runtime taxes you (GC, lock contention, cache non-determinism, syscall overhead) and ask what the hardware ceiling would be without it. That framing is valuable even if you never adopt Redpanda.
Redpanda markets "10× faster tail latencies with up to 3× fewer nodes." Vanlightly re-ran Redpanda's own OpenMessaging benchmark on identical 3× i3en.6xlarge hardware and found the claims "greatly exaggerated" and "not generalizable": at 50 producers / 500 MB/s Redpanda topped out at 330 MB/s while Kafka hit the target; at NVMe saturation Kafka reached ~1,900 MB/s vs Redpanda's ~1,400; under TLS with 50 producers Redpanda's e2e latency hit 24 seconds against sub-second Kafka; and over a 12-hour run Redpanda's p99 climbed to 3.5 s (p99.99 to 26 s) while Kafka improved (Vanlightly, empirical). The original Redpanda benchmark had Kafka misconfigured with log.flush.interval.messages=1 (per-batch fsync, which Kafka never does in production) on Java 11. Crucially, Redpanda's framing that "Kafka doesn't fsync, so it's unsafe" is false: Kafka deliberately relies on replication plus log recovery rather than mandatory per-write fsync (Part I Replication & the ISR, Durability Engineering), a considered latency/durability tradeoff, not an oversight. The real Redpanda value proposition is operational (one binary, no JVM, no separate metadata store) more than it is a universal latency win; and much of Kafka's tail is filesystem/OS-level, recoverable without changing runtimes, Allegro cut producer-latency outliers over a 65 ms SLO by 82% simply by moving brokers to XFS (Allegro/InfoQ, empirical).
Kinesis & Pub/Sub, trading control for a managed meter
The managed services move off the operational-ownership axis: AWS or Google runs the brokers, the storage, the scaling, and the patching, and you interact with a metered API. The two exemplars make opposite sub-choices about the partition abstraction, which is the most instructive thing about them.
Kinesis, the shard meter
Kinesis keeps a partition-like primitive, the shard, but turns it into a billing unit with a hard capacity cap: each shard sustains exactly 1 MB/s (1,000 records/s) write and 2 MB/s read, and the per-account shard count is itself a per-Region quota (1,000–20,000 depending on Region, raisable) (AWS Kinesis quotas). Compare a single Kafka partition on NVMe, which absorbs 10–100+ MB/s (Part I Storage & the Log Engine; Partitioning). The architectural difference is stark: Kafka's partition throughput is bounded by hardware and tuning; Kinesis's is bounded by a price list. The flip side is genuine: you provision throughput by adding shards through an API call and never think about brokers, disks, page cache, or rebalances. Where this number sits in the limits taxonomy, a hard service quota versus Kafka's emergent per-partition rate, is detailed in II · 02 · Limits & Boundaries.
Pub/Sub, drop the partition entirely
Pub/Sub goes further and removes the partition abstraction altogether: there are no shards to provision, the service auto-scales globally, and it is push-or-pull. This eliminates the partition ceiling and the hot-partition problem in one move, but it also eliminates ordering by default. Where Kafka gives strict per-partition total order as a baseline guarantee (Part I The Fetch Path), Pub/Sub is at-least-once with unordered delivery unless you opt into ordering keys. This is the cleanest illustration of a deep tradeoff: the partition is simultaneously Kafka's scaling ceiling and its ordering guarantee. Drop the partition and you lose the ceiling and the order together; they are the same abstraction viewed from two sides.
Managed services are genuinely cheaper for spiky or low-volume workloads: you pay per shard-hour or per message and run zero infrastructure. But the curve inverts at sustained high throughput. Kinesis on-demand runs roughly 2–3× more expensive than self-managed Kafka at steady scale, the 1 MB/s shard cap plus the per-Region shard quota become real ceilings, and hot-shard skew (the managed analogue of Kafka's hot-partition problem, Failure Modes) still bites because the partition-key hash still concentrates a hot key (reference, empirical). The deeper, less-visible cost is lock-in: Kinesis and Pub/Sub are proprietary APIs, so unlike the Kafka protocol, which Redpanda, WarpStream, and others reimplement, there is no second source. You trade the entire operational burden for a metered ceiling and a one-vendor dependency. The honest rule: managed wins decisively below the throughput where the meter overtakes a 3-broker cluster, and loses above it; know which side of that line your workload is on before you commit.
RabbitMQ, the queue-versus-log axis
RabbitMQ is the one system here that is not a log at all, which makes it the most clarifying comparison. It is a "smart broker / dumb consumer": the broker tracks per-message delivery state, routes messages through exchanges (direct, topic, fanout, headers), and supports per-message priority, per-message and per-queue TTL, and rich content-based routing. Kafka is the mirror image, a "dumb broker / smart consumer": the broker appends bytes to a partition and tracks nothing per message; the consumer owns its offset and decides what it has processed (Part I Group Coordination, The Consumer Client). This is not a feature gap; it is the fundamental fork in the road, and each branch is right for opposite requirements.
| Dimension | RabbitMQ (queue) | Kafka (log) | Why the difference is structural |
|---|---|---|---|
| Per-message state | Broker tracks ack/redelivery per message | Broker tracks none; consumer owns the offset | Per-message bookkeeping is the cost that enables routing/priority and the reason throughput is lower. |
| Replay | Consumed message is gone (or dead-lettered) | Re-read from any offset; retention-bounded, not consumption-bounded | The log is an immutable record; the queue is a hand-off. Replay is free in one, impossible in the other. |
| Routing | Exchanges: direct / topic / fanout / headers; content-based | Key-hash to a partition only | Smart-broker routing vs dumb-broker append. Kafka pushes routing logic to consumers/Streams. |
| Per-message TTL / priority | Yes, both | No, topic-level/time-based retention only; no priority | Per-message semantics require per-message state, which Kafka deliberately refuses. |
| Ordering | Per-queue FIFO; priority breaks strict order | Strict per-partition total order, always | Priority and strict order are mutually exclusive; Kafka chose order, RabbitMQ chose priority. |
| Throughput / latency | ~1 ms at low load; degrades past ~30 MB/s | 605 MB/s peak; p99 ~5 ms at 200 MB/s (reference, empirical) | Per-message state & push delivery cap throughput; append + zero-copy + pull scales an order of magnitude higher. |
The queue-vs-log decision should be made on requirements, not familiarity. Reach for a queue (RabbitMQ, SQS, classic MQ) when you need per-message routing, priority, or TTL; when each message is a task consumed once and discarded; when fan-out is modest and replay is not a requirement. Reach for a log (Kafka) when you need durable ordered history, high throughput, broadcast fan-out to many independent consumer groups, or replay/reprocessing. The most expensive mistake in this whole chapter is forcing one onto the other's job: Kafka as a task queue collapses to head-of-line blocking because per-partition order plus one-consumer-per-partition means "partition throughput collapses to the speed of the slowest message" and a poison pill stalls every later message (reference, anti-pattern; mitigated but not removed by more partitions or the Confluent Parallel Consumer, see When To Use a Log and Share Groups, which adds queue-like per-message acknowledgement to Kafka via KIP-932). RabbitMQ as a replayable event store fails just as hard, because consumed messages are gone. The product follows from the axis; pick the axis first.
WarpStream & diskless, the object-storage frontier
The newest move pushes Kafka's storage all the way out: diskless architectures put the log directly in object storage (S3) and make the broker stateless, no local disk, no leader, no inter-broker replication. WarpStream pioneered this commercially; upstream Apache Kafka is bringing it in via KIP-1150 (Diskless Topics), accepted around March 2026. The economic argument is the sharpest in this chapter, and it attacks Kafka's single largest cloud cost head-on.
RF−1 followers (cross-AZ, billed per GB each way) → store RF copies on local SSD. Cross-AZ replication is ~70–90% of the cloud bill at high throughput; triple-replicated local SSD is ~10–20× the per-GiB cost of S3 (reference, empirical).RF−1 cross-AZ replication term disappears entirely and storage drops to ~$0.021/GiB. Claims ~5–10× TCO reduction (WarpStream, advocacy).Recall from Cost Engineering that Kafka's bill is dominated by cross-AZ network, not compute: a 3-AZ RF=3 cluster copies every ingested GB RF−1 times across zone boundaries that AWS meters at ~$0.02/GB effective, and Confluent's own teardown puts networking at 87% of a 100 MBps cluster's cost (Confluent, empirical). Tiered storage (Part I Tiered Storage) cuts the storage term but, a common misconception, does nothing for cross-AZ networking. Fetch-from-follower (KIP-392) cuts the consumer-read term but leaves the produce + replication floor. Diskless is the only move that removes the replication term outright, because object storage is internally multi-AZ-durable, so there is no inter-broker copy to bill. That is why it is the frontier: it is the first architecture that attacks the cost driver every other Kafka optimisation leaves standing. The transferable insight: when your dominant cost is data movement between failure domains, the highest-leverage move is to delegate the failure-domain durability to a substrate that already spans them (S3) rather than re-implementing it yourself (RF replication).
Three sobering qualifications. First, latency is the real tax: pure-S3 diskless swaps a low single-digit-ms floor for ~400 ms produce p99 and ~1 s e2e, fine for log aggregation, ETL, and batch-ish streaming, disqualifying for anything sub-100 ms. It is designed to coexist with classic low-latency topics, not replace them. (AutoMQ's small-EBS-WAL hybrid recovers single-digit-ms but reintroduces a sliver of local state, so it is not purely diskless.) Second, upstream maturity: WarpStream and AutoMQ are mature products, but in Apache Kafka itself diskless is in-flight, KIP-1150 was accepted ~March 2026, and acceptance is not production-ready OSS; the design fragmented into multiple revisions plus a competing KIP-1176 (Vanlightly, empirical). Do not present upstream diskless as shipped or GA. Third, the vendor TCO percentages (WarpStream 80–85%, AutoMQ ~100% of cross-AZ eliminated) use favourable assumptions and are directional, not audited, and they are AWS/GCP-centric, because Azure has historically not charged for inter-AZ traffic at all, which guts the entire cost argument there. Diskless is a genuine frontier with a genuine, large win on the right workload; it is not a universal upgrade.
Where Kafka sits, and why, the synthesis
Lay the seven systems on the axes and Kafka's position resolves into something precise: it is the broad-spectrum generalist of the durable-streaming space. It is not the throughput champion of any single dimension, Pulsar can scale storage independently, Redpanda can run a tighter tail in some regimes, Kinesis needs zero operators, RabbitMQ routes per message, diskless is far cheaper at rest. Kafka wins by being good enough on every axis at once while being best on the combination that matters most for event streaming: throughput-per-partition, replayable durable storage, strict per-partition ordering, and the largest ecosystem on Earth (Streams, Connect, the entire compatible-protocol cohort, Part I Kafka Streams, Kafka Connect). The reason a generalist wins the default is that most real systems have no single dominating constraint: they want decent latency and high throughput and replay and ordering and a hiring pool, and Kafka is the Pareto-efficient point that delivers all five without forcing a specialist's sacrifice.
(the bet) storage cost? tiered storage
(mitigate) consumer cross-AZ? fetch-from-follower
(mitigate) replication cross-AZ? diskless / KIP-1150
(absorb the frontier)
This synthesis also explains the single most important strategic fact about the comparison: the alternatives' best ideas keep migrating back into Kafka. KRaft (KRaft Consensus) removed the very class of auxiliary stateful system that Pulsar's design still mandates. Tiered storage (Tiered Storage) borrowed the object-store-for-cold-data idea. The KIP-848 rebalance protocol (GA in Kafka 4.0, reported up to ~20× faster) closed the historical "stop-the-world rebalance" critique that was largely obsolete already. Share Groups (Share Groups, KIP-932) graft queue-like per-message acknowledgement onto the log, addressing the task-queue gap that sent users to RabbitMQ. And diskless (KIP-1150) is co-opting the object-storage frontier WarpStream opened. Because the Kafka protocol is an open standard with multiple implementations, the ecosystem behaves as a gravity well: a good idea that proves itself in a competitor tends to be re-implemented behind the same wire protocol, so the cost of betting on Kafka is hedged by the near-certainty that its worst current weakness is someone's active roadmap item.
The decision rule that falls out of the whole chapter is disciplined and unsentimental. Default to Kafka unless a single constraint dominates so hard that a specialist's sacrifice is worth it, and verify the dominance with a number, not a vibe. Choose Pulsar only if independent compute/storage scaling and instant rebalance are worth running a second stateful system every day. Choose Redpanda only if you have measured a tail-latency or node-count problem that survives OS-level tuning (XFS, page-cache sizing, GC config), and remember Vanlightly showed the headline reverses on equal hardware. Choose Kinesis/Pub-Sub only if your volume is genuinely spiky or low and zero-ops outweighs a metered ceiling and lock-in. Choose RabbitMQ only if you are on the queue side of the axis, per-message routing/priority/TTL, task semantics, no replay. Choose diskless only if cross-AZ cost is your dominant pain and you can absorb ~400 ms+ produce latency (and you are on AWS/GCP, not Azure). In every other case, which is most cases, the generalist's even competence across all five axes, plus the largest ecosystem and the strongest "your weakness is on the roadmap" hedge, makes Kafka the rational default. The skill this chapter builds is not memorising seven products; it is the habit of naming your binding constraint, mapping it to an axis, and only then reaching for the system that moved off that axis. Carry the cheat-sheet forward to The Architect's Cheat-Sheet.