00 · Architecture Overview
Source: Apache Kafka 4.4.0-SNAPSHOT (git 04bfe7d, 2026-06-15), KRaft mode. Derived from source code, not copied from official documentation.
Apache Kafka is, at its core, a single idea executed with unusual discipline: a distributed, partitioned, replicated commit log. Producers append records to the end of a log; consumers read forward from a position they themselves track; the broker is a comparatively "dumb" custodian of bytes that it stores as sequential append-only segment files and ships to readers with near-zero overhead. Everything else in this guide, replication and the high watermark, the KRaft metadata quorum that replaced ZooKeeper, consumer groups and rebalancing, exactly-once transactions, share groups, tiered storage, Streams and Connect, is an elaboration built on top of that log. This chapter is the front door. It builds an accurate end-to-end mental model: what the abstractions are, how a cluster is laid out in KRaft, how a record physically travels from producer.send() to consumer.poll(), how bytes are stored, how coordination and delivery semantics work, how a broker is threaded, and where every detail lives in the codebase and in the twenty-one chapters that follow. Read it once to orient yourself; return to it as the map.
What Kafka is: a distributed commit log
Strip away the features and Kafka is a log. A log here is the precise computer-science object: an append-only, totally-ordered sequence of records, ordered by the time they were appended, addressed by a monotonically increasing integer position. Jay Kreps called it "perhaps the simplest possible storage abstraction", and it is the same primitive that sits underneath database replication and state-machine replication. Kafka's bet, made at LinkedIn in 2010–2011 and proven out since, is that this one abstraction, made distributed and durable, is the right backbone for both real-time stream processing and offline data integration, replacing an O(N²) tangle of point-to-point pipelines with an O(N) hub-and-spoke.
The core abstractions
- Record
- The unit of data: an optional key, a value (the payload, or
nullfor a compaction tombstone), a timestamp, and an ordered list of headers (string key + bytes). Records are never stored individually on disk, they live inside batches. See Record Format & Batches. - Record batch
- The real on-disk and on-wire unit. The v2
DefaultRecordBatchhas a fixed 61-byte header (base offset, CRC-32C, producer id/epoch/sequence, leader epoch, attributes) followed by varint delta-encoded records. Writing shared fields once per batch cut per-record overhead from ~34 bytes for a standalone legacy v1 message (a 12-byte offset/size log prefix plus the 22-byteRECORD_OVERHEAD_V1) to ~7 bytes. The batch, not the record, is the fundamental unit of write, replication, compression, and fetch. - Partition
- One physical log. A partition is the unit of parallelism and of ordering: records within a partition are strictly ordered by offset; there is no ordering guarantee across partitions. A partition is replicated; exactly one replica is the leader and serves reads/writes.
- Topic
- A named, logical stream, simply a set of partitions. A topic has no storage of its own; it is the partitions that store data.
- Offset
- The position of a record within its partition: a 64-bit integer assigned by the leader at append time, dense and gap-free within a batch's range. The offset is the only per-consumer state Kafka needs, one number per partition tells a consumer exactly where it is.
- Broker
- A server process that hosts partitions, accepts produce/fetch requests, and replicates. A Kafka cluster is a set of brokers plus a controller quorum (below).
- Segment
- A partition log is split into ~1 GiB segments: a
.logfile of record batches plus sparse.index/.timeindex/.txnindexsidecars. Only the last (active) segment is written; old segments are immutable and can be deleted, compacted, or tiered wholesale. See The Log Storage Engine.
reads @1
reads @6
The "dumb broker, smart client" model
The defining architectural choice is that the consumer tracks its own offset, and the broker keeps essentially no per-consumer state on the hot path. A traditional message queue deletes a message when it is consumed and must track, per consumer, what has been acknowledged. Kafka instead addresses every record by a stable logical offset, retains data by a time/size SLA regardless of who has read it, and lets each consumer (or consumer group) say "give me from offset N." This has profound consequences:
- Replay and fan-out are free. Rewinding is a seek; a new consumer group reading history costs the broker nothing extra. Many independent subscribers read the same log.
- The broker stays simple and fast. No per-message acknowledgement bookkeeping, no random-access index of "who has what", just sequential append and sequential scan, which is what disks and the OS page cache are best at.
- Back-pressure is natural. Consumers pull (long-poll fetch) at their own pace; a slow consumer cannot be overwhelmed by a push.
Kafka is a set of append-only, offset-addressed, replicated partition logs. Producers append batches to the leader of a partition; the leader replicates to followers; data below the high watermark is committed and visible; consumers pull forward from a position they own and commit back to the cluster. Hold that picture and every other chapter slots into it.
Cluster anatomy in KRaft (no ZooKeeper)
As of Kafka 4.0, ZooKeeper is gone entirely; 4.4.0-SNAPSHOT runs KRaft-only. A cluster is split cleanly into two planes:
- The data plane: broker nodes that host topic-partition replicas and serve produce/fetch. This is where your records live.
- The metadata plane: a small controller quorum that owns all cluster metadata, topics, partitions, replica assignments, leadership, ISR, ACLs, quotas, feature levels, producer-ID blocks, as an ordered, replayable, fsync'd event log.
Every node runs with a configured set of process.roles. The ProcessRole enum has two members, BrokerRole and ControllerRole (server/.../ProcessRole.java), and KafkaRaftServer instantiates a BrokerServer if the roles contain BrokerRole and a ControllerServer if they contain ControllerRole (KafkaRaftServer.scala:76-82). So a node can be:
process.roles | Runs | Typical use |
|---|---|---|
broker | BrokerServer only, hosts partitions, serves clients | Production data nodes in a larger cluster |
controller | ControllerServer only, a voter in the metadata quorum | Dedicated controllers (3 or 5 of them) in a larger cluster |
broker,controller | Both, in one JVM | "Combined" mode, small clusters and development |
Bootstrapping a KRaft cluster
Because there is no ZooKeeper to seed the cluster, KRaft makes formatting storage a mandatory first step, unlike the ZooKeeper era, a node will not start until its log directories carry a cluster identity. The sequence is:
- Generate a cluster ID.
kafka-storage random-uuidprints a fresh base64Uuid(StorageTool.scala, therandom-uuidcommand →Uuid.randomUuid). The same ID is used to format every node. - Format each node's log dirs.
kafka-storage format --cluster-id <ID> --release-version <MV>writes ameta.propertiesinto eachlog.dirsdirectory (cluster.id, node.id, and a per-directorydirectory.idUUID), and lays down abootstrap.checkpointfile. That checkpoint is theBootstrapMetadata: a tiny record set whose first record is aFeatureLevelRecordpinning the initialmetadata.version(the--release-versionyou chose), so the cluster comes up at a known feature level rather than the code's default. - Declare the initial voters (KIP-853). For a dynamic controller quorum, the format step takes
--standalone(a single-node quorum),--initial-controllers <list>(an explicit voter set, eachid@host:port:directory-id), or--no-initial-controllers; the older static model instead lists voters incontroller.quorum.voters.
Only after a node's directories are formatted does KafkaRaftServer start the Raft stack against them. The deep mechanics of registration and the bootstrap metadata live in The KRaft Controller and Metadata Propagation & Broker Lifecycle.
The metadata quorum and __cluster_metadata
The controllers form a Raft quorum using Kafka's own Raft dialect (KRaft). The quorum's state is a single topic partition, __cluster_metadata-0, replicated across the controller voters. One controller is the Raft leader (the "active controller"); the others are hot standbys. This partition is the single source of truth for the whole cluster: nothing about a topic, a partition's leader, or an ACL is "true" until it is a committed record in this log.
__cluster_metadata-0); brokers replay committed metadata and replicate partition data among themselves, while clients produce to and fetch from the brokers.Brokers are clients of this metadata log. Each broker registers with the active controller (BrokerRegistrationRequest), is fenced until it has caught up, then continually pulls committed metadata records via the Raft Fetch path and sends periodic heartbeats (which double as liveness). Crucially, the controller does not push LeaderAndIsr/UpdateMetadata/StopReplica RPCs the way the ZooKeeper-era controller did; making metadata a log the brokers replay turns topic create/delete into an O(1) append for the controller and eliminates the controller-vs-ZooKeeper state divergence that plagued the old design. The deep mechanics are in KRaft Consensus, The KRaft Controller, and Metadata Propagation & Broker Lifecycle.
KRaft (the metadata quorum over __cluster_metadata) and ordinary partition replication (leader/follower ISR replication of your topic data) are different mechanisms. KRaft uses a majority-vote high watermark over the controller voters; topic-partition replication uses the ISR / min-LEO high watermark over the partition's replicas. They share a code lineage (followers fetch in both) but are governed by separate rules. KRaft replaced only ZooKeeper's metadata role, not data replication.
Control plane vs data plane, and the metadata backbone
The split above is enforced in the request layer too. Brokers do not mutate cluster metadata directly. Around eighteen administrative APIs, CreateTopics, DeleteTopics, the CreateAcls/DeleteAcls family, AlterClientQuotas, ElectLeaders, reassignments, UpdateFeatures, the Raft-voter APIs, when received by a broker are wrapped in an Envelope and forwarded to the active controller, which re-dispatches them through ControllerApis (KafkaApis.scala → forwardToController → ControllerApis.scala). The AlterConfigs/IncrementalAlterConfigs family is a partial exception: its handler preprocesses some resources locally and forwards only the remainder. The controller is the only writer of metadata; brokers are readers.
The propagation backbone is a clean pipeline. The active controller is a deterministic replicated state machine: a single-threaded event loop (QuorumController, thread quorum-controller-{id}-event-handler) turns each request into a list of metadata records, appends them to the Raft log, and applies them to in-memory state under a strict write → commit → apply discipline so the active controller and every standby reach byte-identical state by replaying the same records in offset order.
On the broker side, the MetadataLoader consumes Raft commits on one thread, folds them into immutable MetadataImage snapshots (a MetadataProvenance plus nine sub-images, rebuilt incrementally via MetadataDelta), and publishes each new image to an ordered chain of MetadataPublishers. The BrokerMetadataPublisher is what actually turns metadata into behaviour: it swaps the cache image, then calls replicaManager.applyDelta(topicsDelta, newImage) to drive partitions into leader/follower roles, and elects or resigns coordinator shards. This is the join between the metadata plane and the data plane.
The end-to-end data path for a record
Here is the journey of a single record, from a producer's send() to a consumer's poll(), with each hop linked to its chapter. This is the most important diagram in the guide.
Narrating the hops:
- Batching (producer).
send()never does network I/O on the caller's thread. It serializes, selects a partition (explicit, murmur2 key-hash, or the adaptive stickyBuiltInPartitioner), and appends bytes into a per-partition deque in theRecordAccumulator. A single Sender thread drains full or lingering batches. Batching is what makes Kafka fast, it amortizes round-trips and turns writes into large sequential chunks. See The Producer Client. - ProduceRequest to the leader. The batch travels as a length-prefixed, versioned RPC (Wire Protocol & RPC) to the broker that currently leads that partition. The client learned the leader from a cached Metadata response.
- Append to the log. The broker's network/handler threads hand the request to
KafkaApis(Request Processing), which authorizes and callsReplicaManager. Foracks=allthe leader first rejects if the ISR is belowmin.insync.replicas. ThenUnifiedLog.appendvalidates the batches, assigns offsets from the current log-end offset, stamps the leader epoch into each batch header, and writes to the active segment, advancing the LEO (The Log Storage Engine). - Replication to followers (ISR). Followers are themselves fetch clients of the leader: their
ReplicaFetcherThreads pull new records and report their own LEO back. The leader tracks each follower and maintains the in-sync replica set. See Replication, ISR & High Watermark and The Fetch Path. - High-watermark commit (acks). The high watermark is the minimum LEO over the ISR, the boundary below which data is committed and can never be lost to a clean leader change. For
acks=all, the produce request waits in the produce purgatory until the HW reaches its offset, then the ack is sent.acks=1replies once the leader has written;acks=0is fire-and-forget, the broker sends no response at all (aNoOpResponse), and the client's send callback completes locally with a fabricatedRecordMetadataoffset of −1. - Consumer fetch. A consumer fetches from the leader (or, with KIP-392, a rack-local follower). The leader serves the read from the OS page cache, bounded by an isolation level:
read_uncommittedsees up to the HW,read_committedsees only up to the last stable offset and skips aborted-transaction batches. Bytes are sent withsendfile(zero-copy). See The Consumer Client. - Group offset commit. After processing, the consumer commits its position to the group coordinator, which durably writes it to the internal
__consumer_offsetstopic. On restart or rebalance, consumers resume from the committed offset. See Group Coordination.
The storage model in one screen
A partition replica on disk is a directory named topic-partition containing a sequence of segments. Each segment is a .log file of record batches plus three sparse, memory-mapped sidecar indexes. The layered abstraction is UnifiedLog → LocalLog → LogSegments → LogSegment.
File in /var/lib/kafka/orders-0/ | Contents |
|---|---|
00000000000000000000.log | record batches (the data); the 20-digit prefix is the segment base offset = first offset in the file |
00000000000000000000.index | sparse offset → file-position (8 B/entry) |
00000000000000000000.timeindex | sparse timestamp → relative offset (12 B/entry) |
00000000000000000000.txnindex | aborted-txn ranges (lazy; read_committed) |
00000000000000004096.log | next segment; rolls at segment.bytes (1 GiB), segment.ms (7 d), or when an index fills |
00000000000000004096.index | index for the next segment |
00000000000000004096.snapshot | producer-state snapshot (idempotence/txn) |
leader-epoch-checkpoint | epoch → start offset (truncation authority) |
partition.metadata | topic id |
Read-by-offset: offset → OffsetIndex.lookup (greatest ≤ target) → short forward scan → FileRecords.slice → sendfile to socket (no JVM heap copy).
The storage design is a deliberate set of bets, every one of which trades generality for throughput:
- Sequential everything. Appends go to the tail; reads scan forward. Sequential disk I/O is orders of magnitude faster than random, so performance is decoupled from total retained volume.
- Sparse indexes, relative offsets. An index entry is added only every
index.interval.bytes(4 KiB) of log. Offsets are stored relative to the segment base so a 64-bit offset fits in 4 bytes. Indexes carry no checksum and are rebuilt from the log if corrupt. - Lean on the OS page cache. Kafka keeps no in-JVM record cache. The kernel page cache survives broker restarts, avoids double-buffering, and avoids GC pressure on huge heaps. A caught-up cluster does essentially no disk reads.
- Zero-copy. Reads return
FileRecords, a window over the segment'sFileChannel;FileRecords.writeTobottoms out insendfile(2), so committed bytes flow page-cache → NIC without entering user space. (Zero-copy is defeated by TLS or by down-converting v2 to an old format.) - Durability via replication, not fsync. By default
flush.messagesandflush.msare effectively infinite, Kafka does not fsync every write. Durability comes from replication across the ISR plus the page cache, not per-write disk sync.
Two retention modes govern a log's lifecycle, and a third tier extends it:
| Mode | cleanup.policy | What it keeps | For |
|---|---|---|---|
| Retention (delete) | delete | Whole oldest segments deleted by size (retention.bytes) or age (retention.ms, default 7 d) | Event streams, logs, most topics |
| Compaction | compact | At least the latest value per key; null value = tombstone (retained delete.retention.ms) | Changelogs, CDC, state, __consumer_offsets, __transaction_state |
| Tiered storage | (orthogonal) | Closed segments offloaded to object store; local tier keeps the hot tail | Near-infinite retention, small clusters |
Retention and compaction mechanics are in Log Management, Retention & Compaction; the remote tier (RSM/RLMM SPIs, __remote_log_metadata, copy/fetch quotas) is in Tiered Storage.
Coordination & delivery semantics
Consumer groups & rebalancing
A consumer group scales out consumption: the group's members divide the subscribed partitions among themselves, with each partition assigned to exactly one member (preserving per-partition order). A broker-side group coordinator, the leader of the relevant __consumer_offsets partition, owns the group's membership and offsets, persisting them as records in that partition's log (its durable state is the log; failover is just a replay). The rebalance protocol has evolved decisively from client-driven to server-driven:
| Protocol | group.protocol | How assignment happens |
|---|---|---|
| Classic | classic | A consumer (the elected "leader," not the broker) computes the assignment via JoinGroup/SyncGroup; eager (stop-the-world) or incremental cooperative (KIP-429) |
| Consumer KIP-848 | consumer | A single long-lived ConsumerGroupHeartbeat RPC; assignment is fully server-side; incremental reconciliation via three epochs (group / assignment / member). GA in 4.0; server-side assignors only |
| Streams KIP-1071 | streams | The KIP-848 model extended to Kafka Streams task assignment; Early Access in 4.1, Generally Available (core feature set) since 4.2 |
The new Java group coordinator (default on KRaft in 4.0) runs each __consumer_offsets partition as a single-threaded replicated state machine and also backs share groups and Streams groups. Full detail in Group Coordination & Rebalance Protocols.
Exactly-once & transactions
Exactly-once semantics is two layers (KIP-98), both carried in the v2 batch header and reconstructable on log reload:
- The idempotent producer deduplicates retries within one producer session, per partition, using a producer id (PID) + epoch + monotonic per-partition sequence number that the leader's
ProducerStateManagervalidates. On by default since 3.0 (withacks=all, in-flight ≤ 5). - Transactions give atomic, all-or-nothing writes across many partitions and across sessions. A
TransactionCoordinatormaps a usertransactional.idto a PID with epoch fencing, runs a two-phase commit over the internal__transaction_statelog, and writes COMMIT/ABORT marker control records into every involved partition.read_committedconsumers stop at the last stable offset and filter aborted batches via the transaction index.
Folding consumer-offset commits into the transaction (sendOffsetsToTransaction) makes the read-process-write loop a single atomic unit, the basis for Kafka Streams' end-to-end exactly-once. See Transactions & Exactly-Once.
Share groups (queues)
Share groups KIP-932 add a queue-like model: many consumers cooperatively read the same partition (N:M), and records are leased individually under a time-bounded acquisition lock with a per-record delivery state (Available → Acquired → Acknowledged/Archived) and delivery count, rather than committed by offset. This gives at-least-once with bounded redelivery and an optional dead-letter queue, decoupling parallelism from partition count. Durable state lives in __share_group_state via a share coordinator. See Share Groups.
Delivery & ordering guarantees
| Guarantee | What holds | Mechanism |
|---|---|---|
| Per-partition ordering | Records in a partition are read in offset order, always | Single leader assigns dense offsets at append; followers copy a prefix |
| No cross-partition ordering | Order across partitions is undefined | Partition is the unit of ordering by design |
| At-least-once | The default with acks=all + retries; a record may be delivered more than once | Producer retries; consumer commits after processing |
| Exactly-once | Opt-in: idempotent producer + transactions + read_committed | PID/epoch/sequence dedup; 2PC markers; LSO read boundary |
| Read boundary (HW) | read_uncommitted consumers never see uncommitted data | High watermark = min LEO over ISR; reads clamp to HW |
| Read boundary (LSO) | read_committed consumers never see undecided/aborted txn data | Last stable offset = below it every transaction is decided |
acks=all waits for all members of the current ISR, which can shrink to just the leader. Durability requires acks=all and min.insync.replicas >= 2 so the leader rejects writes when the ISR is too small. The classic loss case is replication.factor=3, min.insync.replicas=1, acks=all. Eligible Leader Replicas (KIP-966) close the remaining "last replica standing" gap by freezing HW advancement when the ISR drops below min ISR.
The threading model of a broker at a glance
A broker is a carefully partitioned set of thread pools, each with a single job, communicating through bounded hand-off queues. Nothing on the request hot path blocks on disk or network from a handler thread, slow work is parked in a purgatory and completed by an event later.
- Acceptor
- One non-daemon thread per listener; accepts TCP connections and round-robins them to processors. Connection quotas (count + creation rate) are enforced here.
- Processor (network thread)
num.network.threadsper listener (default 3), each owning onejava.nio.Selector. Reads size-delimited requests, hands them to theRequestChannel, and writes responses back. Per-connection ordering is guaranteed by muting a channel while its request is in flight.- Request handler (I/O thread)
num.io.threadsdaemon threads (default 8) runningKafkaApis.handle, a giant dispatch onapiKeyto ~80 handlers. TheRequestChannel(bounded atqueued.max.requests= 500) is the back-pressure point: when handlers fall behind, network threads block on the queue and stop reading sockets, turning handler slowness into TCP back-pressure.- Replica fetcher
- Background threads on a follower broker that pull from leaders (
num.replica.fetchersper source, keyed by leader+fetcherId). Followers are fetch clients of leaders. - Purgatory & reapers
- A
DelayedOperationPurgatoryparks operations (acks=all produce, under-min-bytes fetch, remote fetch, …) that cannot complete immediately, indexed by watch key and by a hierarchical timing wheel (O(1) insert/cancel). AnExpirationReaperadvances the clock; operations complete when their condition fires or they time out. - Coordinator / controller threads
- Group/transaction/share coordinators run per-partition single-threaded event loops. A combined node also runs the controller's
quorum-controllerevent loop and thekafka-raft-io-thread.
The request lifecycle is instrumented end to end: each request carries timestamps decomposing latency into request-queue time, local processing, remote (purgatory) time, response-queue time, and send time, each written by a different actor. The reactor (Selector + KafkaChannel) is the same code used by producers, consumers, and replica fetchers. Deep dives: Network Layer & Threading and Request Processing (KafkaApis).
Map of the codebase
The repository is a multi-module Gradle build. Knowing which module owns what is half the battle when reading source. The principal modules:
| Module | Language | What lives there | Chapters |
|---|---|---|---|
clients | Java | The producer, consumer, admin, and share clients (the AdminClient has no chapter of its own, its controller-forwarding pipeline surfaces in 07); the shared NIO reactor (Selector, KafkaChannel); the wire-protocol message types and generated schemas; record-format classes; security channel stack | 01, 02, 16, 17, 18 |
core | Scala | The broker itself: SocketServer, KafkaApis, ReplicaManager, Partition, BrokerServer/ControllerServer/KafkaRaftServer, replica fetchers, the transaction coordinator runtime, broker-side metadata publishing | 06, 07, 08, 09, 14 |
storage | Java | The log engine: UnifiedLog, LocalLog, LogSegment, the indexes, LogManager, the LogCleaner (compaction), ProducerStateManager, tiered-storage RemoteLogManager | 03, 04, 05 |
metadata | Java | The KRaft controller (QuorumController and the per-domain control managers), the MetadataImage/MetadataDelta/MetadataLoader pipeline, PartitionRegistration, the StandardAuthorizer | 11, 12, 18 |
raft | Java | Kafka's own Raft implementation: KafkaRaftClient, the QuorumState role machine, snapshots, dynamic quorum reconfiguration | 10 |
group-coordinator | Java | The modern Java group coordinator: CoordinatorRuntime, GroupMetadataManager, OffsetMetadataManager, classic + KIP-848 protocols, server-side assignors; share-group config | 13, 15 |
transaction-coordinator | Java/Scala | Producer-ID management (RPCProducerIdManager), transaction state schemas, marker plumbing | 14 |
share-coordinator | Java | The share-group state coordinator and its __share_group_state persistence | 15 |
server / server-common | Java | Shared server infrastructure: config classes, ProcessRole, the purgatory + timing wheel, quotas, broker lifecycle, AssignmentsManager, feature/metadata-version definitions | 06, 12, 19 |
streams | Java | Kafka Streams: topology builder, StreamThread, the task model, state stores, the changelog restorer, EOS-v2 | 20 |
connect | Java | Kafka Connect: the plugin SPI, Worker and task loops, the DistributedHerder, the three internal topics, MirrorMaker 2 | 21 |
How to read this guide
The twenty-one detailed chapters are written to stand alone, but they build on each other. Here is a recommended path and a logical grouping. If you read top to bottom you will rarely hit a forward reference you do not already understand.
Recommended reading order
- Start here (this chapter) for the whole-system model.
- Bytes & protocol, the vocabulary everything else speaks: 01 · Record Format & Batches, then 02 · Wire Protocol & RPC.
- Storage, where records live: 03 · The Log Storage Engine, 04 · Log Management, Retention & Compaction, 05 · Tiered Storage.
- The broker's spine, how requests flow: 06 · Network Layer & Threading, 07 · Request Processing (KafkaApis).
- Durability, making data safe and readable: 08 · Replication, ISR & High Watermark, 09 · Fetch Path & Replica Fetchers.
- The metadata plane, the KRaft brain: 10 · KRaft Consensus (Raft), 11 · The KRaft Controller, 12 · Metadata Propagation & Broker Lifecycle.
- Coordination & semantics, groups, EOS, queues: 13 · Group Coordination, 14 · Transactions & Exactly-Once, 15 · Share Groups.
- The clients, the smart edge: 16 · The Producer Client, 17 · The Consumer Client.
- Cross-cutting concerns: 18 · Security, 19 · Quotas & Throttling.
- Built on top, the higher-level frameworks: 20 · Kafka Streams, 21 · Kafka Connect.
- Keep the Glossary & Cross-Cutting Concepts open in a tab.
Alternative entry points
- "I operate clusters"
- Read 08 (replication/durability), 12 (broker lifecycle), 04 (retention), 05 (tiered storage), 19 (quotas), 18 (security). The durability gotcha above is the single most important operational fact.
- "I write producers/consumers"
- Read 16, 17, then 13 (groups) and 14 (EOS) for semantics, with 01/02 for what goes over the wire.
- "I'm here for KRaft"
- Read 10 (the Raft dialect), 11 (the controller state machine), 12 (how brokers consume metadata). Note that KRaft is a pull-based Raft dialect with no leader heartbeats, not textbook Raft.
- "I want exactly-once"
- Read 14 first, then 13 (offset commits fold into transactions) and 03 (producer state on disk). Streams EOS-v2 is in 20.
The guide is layered like Kafka itself: bytes, then storage, then the broker, then durability, then the metadata plane, then coordination, then clients, then the frameworks built on all of it. Each chapter is derived directly from the 4.4.0-SNAPSHOT source (git 04bfe7d) with file-and-line citations, KRaft-only, and fact-checked, not paraphrased from official docs. Where a date or attribution is commonly mis-stated (e.g. "acks=all guarantees no loss," "KRaft is standard Raft," "tiered storage was GA in 3.6"), the chapters call out the correct fact explicitly.