22 · Glossary & Cross-Cutting Concepts
Source: Apache Kafka 4.4.0-SNAPSHOT (git 04bfe7d, 2026-06-15), KRaft mode. Derived from source code, not copied from official documentation.
This closing chapter is a reference index for the whole guide. Part A is a glossary: every load-bearing term used across the preceding chapters, defined in one to three sentences and cross-linked to the chapter that develops it. Part B draws out the cross-cutting ideas that recur everywhere, the log abstraction and stream-table duality, the three delivery semantics and where each is enforced, the ordering guarantees, the everything is a replicated log pattern that unifies data partitions with the five internal topics, epochs and fencing as a universal anti-zombie mechanism, and the timeline/snapshot data structures that let in-memory state track a replicated log. Terms are grouped by domain rather than strictly alphabetised so that related vocabulary reads together; the on-page table of contents lets you jump to any group.
Part A, Glossary
Terms are grouped into nine domains: the log & offsets, topics, partitions & replication, on-disk structures, the record format, producers & transactions, coordinators, groups & rebalancing, KRaft & metadata, security & authorization, and cross-cutting mechanisms. Each definition links to the chapter where the term is grounded in source.
The log & offsets
- Offset
- A monotonically increasing
int64assigned to each record within a partition by the leader at append time; it is the record's permanent address and is never reused or reordered. See The Log Storage Engine. - Log end offset (LEO)
- The offset that will be assigned to the next record appended, i.e. one past the last record. Each replica has its own LEO; the leader tracks every follower's LEO to compute the high watermark. See The Log Storage Engine and Replication, ISR & High Watermark.
- High watermark (HW)
- The highest offset that has been replicated to the full in-sync replica set and is therefore committed and consumer-visible. It equals the minimum LEO over the maximal ISR, is monotonic, and never exceeds the leader's LEO (
UnifiedLog.java:564).read_uncommittedconsumers may read up to the HW. See Replication, ISR & High Watermark. - Last stable offset (LSO)
min(highWatermark, firstUnstableOffset), the highest offset below which there are no in-flight (undecided) transactions.read_committedconsumers may not read past the LSO. See Transactions & Exactly-Once.- Isolation level (
read_committed/read_uncommitted) - The consumer
FetchRequestsetting (aFetchIsolationon the broker) that bounds how far a fetch may see:read_uncommitted(the default,HIGH_WATERMARK) reads up to the high watermark;read_committed(TXN_COMMITTED) reads only up to the LSO and filters out batches belonging to aborted transactions using the segment.txnindex. See Transactions & Exactly-Once and The Consumer Client. - Log start offset
- The lowest offset still retained in the log; advanced by retention, compaction,
DeleteRecords, or tiering. Advancing it requires the new value to be ≤ the HW, elseOffsetOutOfRangeException(UnifiedLog.java:1358). The local log start offset (≥ the log start offset) is the lowest offset held on local disk when tiered storage is enabled. See Log Management, Retention & Compaction. - Recovery point
- The first not-yet-fsynced offset; advanced only at an actual flush. With default
flush.messages/flush.ms=Long.MAX_VALUE, durability relies on replication and the OS page cache rather than fsync. See The Log Storage Engine. - Watermark checkpoint
- The
replication-offset-checkpointfile (and siblingsrecovery-point-offset-checkpoint,log-start-offset-checkpoint,cleaner-offset-checkpoint): plain-texttopic partition offsettriples written crash-consistently via atomic rename, persisting the HW and other boundaries across restarts (default interval 5000 ms). See Replication, ISR & High Watermark.
Topics, partitions & replication
- Topic
- A named, partitioned, append-only stream of records, the unit of pub/sub addressing. Internally a topic is just a set of partition logs plus metadata (id, config, ACLs). See Architecture Overview.
- Partition
- The unit of parallelism, ordering and replication: a totally ordered log identified by
(topic, partition). All ordering and delivery guarantees are scoped to a single partition. See The Log Storage Engine. - Replica
- One copy of a partition log hosted on a broker. The replication factor is the number of replicas; each is either the leader or a follower. See Replication, ISR & High Watermark.
- Leader / follower
- Exactly one replica is the leader, which serves all produces and consumer fetches and assigns offsets; the others are followers that replicate by issuing Fetch requests to the leader (replication is pull-based). See Replication, ISR & High Watermark and Fetch Path & Replica Fetchers.
- ISR (in-sync replicas)
- The set of replicas (including the leader) that are caught up to the leader within
replica.lag.time.max.ms(default 30 s). In KRaft the ISR is controller-authoritative: the leader proposes changes via theAlterPartitionRPC and the controller commits them by appending aPartitionChangeRecord. The HW cannot advance while the ISR is below the effectivemin.insync.replicas. See Replication, ISR & High Watermark. - ELR (eligible leader replicas)
- KIP-966 state added to
PartitionRegistration(elr[],lastKnownElr[]): replicas that were in the ISR when it shrank and so still hold all committed data, making them safe to elect as leader without data loss even after the ISR empties. The controller elects from ISR first, then ELR, then last-known ELR, and only as an unclean last resort an arbitrary replica. See Replication, ISR & High Watermark. - Leader epoch
- A monotonically increasing
int32stamped into every v2 batch header by the leader; it identifies which leadership term wrote a range of offsets. Each replica keeps aLeaderEpochFileCachemapping epoch → first offset, which drives truncation-safe follower replication (KIP-101/KIP-279). See The Log Storage Engine and Replication, ISR & High Watermark. - Partition epoch
- The optimistic-concurrency token for a partition's leader/ISR state, bumped by the controller on every
PartitionChangeRecord. AnAlterPartitioncommits only if its partition epoch matches the controller's current value, elseINVALID_UPDATE_VERSION. See Replication, ISR & High Watermark and The KRaft Controller. - Rack awareness
- Placement and routing by failure domain:
StripedReplicaPlacerspreads replicas across racks first then brokers (The KRaft Controller), and fetch-from-follower (KIP-392) lets a consumer read from the most caught-up same-rack replica viaRackAwareReplicaSelector(Fetch Path & Replica Fetchers). - Zero-copy
- Returning log bytes straight from page cache to the socket via
sendfile(2)(FileRecords.writeTo→transferFrom), so fetched data never enters the JVM heap. Disabled only when records must be re-encoded (e.g. a follower re-appending). See Fetch Path & Replica Fetchers. - Page cache
- The OS file-system cache that Kafka relies on instead of an in-process record cache; writes go to page cache and are flushed by the OS, and zero-copy reads serve hot data without a disk seek. See The Log Storage Engine.
On-disk structures
- Segment
- A contiguous slice of a partition log stored as a
.logfile named by its 20-digit base offset, paired with its indexes. The newest (highest-base-offset) segment is the active segment, the only one being appended; a segment rolls on size, time, index-full, or offset-overflow. See The Log Storage Engine. - Index (offset / time / transaction)
- Three sparse companions to each segment: the memory-mapped
.index(offset → file position, 8-byte entries), the memory-mapped.timeindex(timestamp → relative offset, 12-byte entries), and the non-mmapped, lazily-created.txnindexrecording aborted transactions forread_committedfetches. Entries are added everyindex.interval.bytes(default 4096). Indexes carry no checksum and are rebuilt from the log on corruption. See The Log Storage Engine. - Compaction vs retention
- The two
cleanup.policymodes. Retention (delete) drops whole oldest-first segments past a size or time bound. Compaction (compact) keeps only the latest record per key, copying live records into new segments via a two-pass cleaner. A topic may set both. See Log Management, Retention & Compaction. - Tombstone
- A record with a
nullvalue, signalling deletion of its key during compaction. Tombstones are themselves retained fordelete.retention.ms(default 24 h) after entering the clean section, enforced in v2 by a delete-horizon stamped into the batch header, so that all replicas observe the deletion before it disappears. See Log Management, Retention & Compaction. - Tiered storage / remote log
- KIP-405 offloading of older, closed log segments to an external store (object storage, HDFS, …) so brokers keep only recent data locally, enabled cluster-wide by
remote.log.storage.system.enableand orchestrated per-partition by theRemoteLogManager(RLM). Only sealed (non-active) segments whose copy reachedCOPY_SEGMENT_FINISHEDare remote-readable; local deletion of a tiered segment is guarded byhighestOffsetInRemoteStorage. A consumer fetch below the local range is served transparently from the remote store. See Tiered Storage. - Local log start offset
- The lowest offset still held on local disk (≥ the log start offset) once tiered storage is enabled; offsets between the log start offset and this boundary live only in the remote store and are fetched from there on demand. See Tiered Storage.
- RSM / RLMM (
__remote_log_metadata) - The two pluggable SPIs of tiered storage: the RemoteStorageManager (RSM) reads and writes segment and index objects in the external store, while the RemoteLogMetadataManager (RLMM) tracks each remote segment's id, offset range and state. The default RLMM persists that metadata to the internal compacted topic
__remote_log_metadata(50 partitions) and materialises an in-memory index from it, one more instance of the replicated-log pattern. See Tiered Storage.
The record format
- Record batch (v2)
- The
DefaultRecordBatch: the on-disk/on-wire container with a fixed 61-byte plaintext header (base offset, length, partition leader epoch, magic, CRC-32C, attributes, PID/epoch/base-sequence, record count, …) followed by a compressed run of delta-encoded inner records. The atom of replication, compression, idempotence and CRC. See Record Format & Batches. - Control record
- A record in a batch whose attributes bit 5 is set; its typed key (
Version,Type) names a control event,COMMIT/ABORT(transaction markers),LEADER_CHANGE, snapshot header/footer, and KRaft voter/version records. Control records are never compacted and clients may not write them. See Record Format & Batches. - Transaction marker
- The
COMMITorABORTcontrol record (EndTransactionMarker, carrying the coordinator epoch) that the transaction coordinator writes into every partition a transaction touched, atomically deciding the transaction's visibility. See Transactions & Exactly-Once.
Producers & transactions
- Producer ID (PID)
- A cluster-unique
int64stamped in every batch from an idempotent or transactional producer, allocated by the controller in blocks of 1000 and never reused. See Transactions & Exactly-Once. - Producer epoch
- An
int16generation counter for a PID. Bumped onInitProducerId(and, under Transaction V2, on every commit/abort) to fence zombies; the leader rejects any batch carrying a lower epoch, and, under Transaction V2, any marker whose epoch is not strictly greater than the current one. See Transactions & Exactly-Once. - Sequence number
- A per-partition
int32base sequence in the batch header, starting at 0 and advancing by the record count. The leader requiresnextFirstSeq == lastSeq + 1for a given(PID, epoch); violations raiseOutOfOrderSequenceException. See Transactions & Exactly-Once. - Idempotent producer
- A producer with
enable.idempotence=true(the default, requiringacks=alland ≤ 5 in-flight requests) whose(PID, epoch, sequence)triple lets the leader'sProducerStateManagerdeduplicate retries and reject reordering, giving exactly-once-in-order delivery to a single partition. See The Producer Client and Transactions & Exactly-Once. - transactional.id
- A stable, user-supplied string naming a logical producer across restarts; it maps deterministically to one
__transaction_statepartition and thus one coordinator, and anchors epoch-based zombie fencing. See Transactions & Exactly-Once.
Coordinators, groups & rebalancing
- Coordinator (group / transaction / share)
- A broker-hosted replicated state machine that owns one partition of an internal topic. The group coordinator owns a
__consumer_offsetspartition (group membership + committed offsets); the transaction coordinator owns a__transaction_statepartition; the share coordinator owns a__share_group_statepartition. The owning broker is the leader of the relevant partition, picked byabs(key.hashCode()) % partitionCount. See Group Coordination, Transactions & Exactly-Once, Share Groups. - Consumer group
- A set of consumers that cooperatively consume a subscription, with each partition assigned to exactly one member and one committed offset per partition. Managed by the group coordinator. See Group Coordination.
- Rebalance (eager / cooperative / KIP-848)
- The process of (re)distributing partitions among group members. Eager (classic protocol) revokes all partitions then reassigns via a stop-the-world JoinGroup/SyncGroup barrier. Cooperative (KIP-429) revokes only the partitions that move, in two rounds. The KIP-848 consumer protocol replaces the barrier with an incremental, server-driven
ConsumerGroupHeartbeatreconciled by epochs (no leader, no client-side assignor; GA in 4.0). See Group Coordination. - Share group
- KIP-932 queue-like consumption: N share consumers cooperatively read the same partition (N:M), with records leased individually under a time-bounded acquisition lock and a per-record delivery state instead of an offset commit, giving at-least-once with bounded redelivery and an optional dead-letter queue. Durable state lives in
__share_group_state. See Share Groups.
KRaft & metadata
- KRaft
- Kafka Raft: the built-in consensus subsystem that replaced ZooKeeper (fully removed in 4.0). It stores all cluster metadata as a single Raft-replicated log partition,
__cluster_metadata-0, over a quorum of controllers. See KRaft Consensus. - Controller quorum
- The set of voter nodes that run the Raft protocol over the metadata log and elect an active controller. One controller is the Raft leader; the others are hot standbys that replay the same records. See KRaft Consensus and The KRaft Controller.
- Voter vs observer
- A voter participates in elections and counts toward the commit majority; an observer (every broker, and catching-up nodes) only fetches the metadata log to stay current. Both replicate by pulling via Fetch; only voters vote. See KRaft Consensus.
- Quorum
- A majority of voters. A metadata record is committed once replicated to a quorum (the leader counts itself); the Raft high watermark is the offset replicated to a majority. See KRaft Consensus.
- Metadata log
- The single-partition internal topic
__cluster_metadata(Topic.CLUSTER_METADATA_TOPIC_NAME) carrying every cluster-state change as a typed record; it is the source of truth from which all broker state is materialised. See Metadata Propagation & Broker Lifecycle. - MetadataImage / MetadataDelta
- The materialised view of the metadata log.
MetadataImageis an immutable record of ten components, aMetadataProvenanceoffset/epoch marker plus nine sub-images (features, cluster, topics, configs, client-quotas, producer-ids, ACLs, SCRAM, delegation-tokens);MetadataDeltaaccumulates a batch of records and produces the next image, reusing unchanged sub-images by reference. Readers swap the whole image atomically. See Metadata Propagation & Broker Lifecycle. - metadata.version
- A finalised feature level (KIP-584) stored as a
FeatureLevelRecordthat gates which record types and behaviours the cluster uses; raised only when the controller software and every registered broker and controller support the new level. See The KRaft Controller and Metadata Propagation & Broker Lifecycle. - Broker epoch
- A lease identifier the controller assigns at broker registration and bumps on each re-registration; carried in heartbeats and Fetch requests so the controller and leaders can fence a stale broker incarnation. See Metadata Propagation & Broker Lifecycle.
Security & authorization
- ACL (access-control entry)
- An authorization rule binding a
{ResourceType, ResourceName, PatternType, Principal, Host, Operation, PermissionType}tuple. In KRaft, ACLs are cluster metadata: each is stored as anAccessControlEntryRecordin__cluster_metadata(removed viaRemoveAccessControlEntryRecord) and materialised into theAclsImage. See Security: Authentication & Authorization. - Authorizer (StandardAuthorizer)
- The pluggable component (
authorizer.class.name) that decides each request'sALLOW/DENY. The built-in KRaft implementation isStandardAuthorizer, which reads ACLs straight from the metadata image. Its evaluation order is fixed: a configured super-user wins outright, otherwise any matchingDENYbeats everyALLOW, and if nothing matches it falls back to the default (allow.everyone.if.no.acl.found, default deny). See Security: Authentication & Authorization. - KafkaPrincipal / principal
- The authenticated identity an
Authorizerchecks, of the formtype:name(almost alwaysUser:alice); built from the connection's credentials by aKafkaPrincipalBuilderafter authentication. The unauthenticated identity isUser:ANONYMOUS. See Security: Authentication & Authorization. - SASL
- The authentication framework Kafka uses on the wire, selected per listener via a
security.protocolofSASL_PLAINTEXTorSASL_SSL. Supported mechanisms arePLAIN,SCRAM-SHA-256/SCRAM-SHA-512,GSSAPI(Kerberos), andOAUTHBEARER. See Security: Authentication & Authorization. - SSL / TLS
- Transport encryption for the
SSLandSASL_SSLlisteners; withssl.client.auth=requiredit also performs mutual TLS, authenticating the client from its certificate and deriving theKafkaPrincipalfrom the certificate's distinguished name. See Security: Authentication & Authorization. - Delegation token
- A lightweight shared secret (KIP-48) that an already-authenticated client can obtain to authenticate later connections without re-presenting its primary credentials, handy for distributed workers. Tokens are SCRAM-style secrets replicated as cluster metadata (the
DelegationTokenImage) so every broker can verify them. See Security: Authentication & Authorization. - super.users
- The broker config listing principals (e.g.
User:admin;User:broker) that bypass ACL evaluation entirely, every operation is allowed before any ACL or default is consulted. Used for inter-broker and administrative identities. See Security: Authentication & Authorization.
Cross-cutting mechanisms
- Fencing
- Rejecting operations from a stale instance by comparing a monotonically increasing token (epoch). Used everywhere: producer epoch fences zombie producers, leader epoch fences stale leaders, partition/coordinator/broker epochs fence stale state. See Transactions & Exactly-Once and Part B below.
- Purgatory
- A
DelayedOperationPurgatorythat parks an operation which cannot complete immediately (e.g.acks=allproduce waiting for the HW, a fetch waiting formin.bytes) and completes it on a triggering event or a timeout driven by a hierarchical timing wheel. See Network Layer & Threading. - Quota / throttling
- Overload protection: per-entity limits on produce/fetch byte rate, request CPU %, controller mutations, and replication, enforced by a shared
Sensor+Rate/TokenBucket. Over-quota clients are throttled by muting the channel for a computed delay and returningthrottle_time_msso they back off. See Quotas & Throttling.
Part B, Cross-cutting concepts
Five ideas appear in nearly every chapter. Reading them together explains why so much of Kafka looks the same regardless of subsystem.
The log abstraction & stream-table duality
Kafka's one foundational data structure is the append-only, totally ordered, offset-addressed log. A partition is a log; a topic is a set of logs; the metadata quorum is a log; every coordinator's durable state is a log. Everything else, indexes, the high watermark, retention, compaction, is bookkeeping over that log.
A compacted log encodes a table: the latest record per key is the current value, a tombstone is a deletion, and the log itself is the ordered changelog of every mutation. This stream-table duality is why the same __consumer_offsets, __transaction_state, __share_group_state and __cluster_metadata logs can be replayed to rebuild an in-memory table after failover, and why Kafka Streams (Kafka Streams Architecture) materialises a KTable from a compacted changelog and emits a KStream from the same log read forward.
If you understand the log, you understand most of Kafka. Replication, consumer offsets, transactions, share-group state and cluster metadata are all the same pattern, a replicated log plus a deterministic replay that turns it into in-memory state.
Delivery semantics & where each is enforced
Kafka supports three end-to-end delivery semantics. Which one you get is not a single switch but the combination of producer settings, consumer commit timing, and (for the strongest) transactions.
| Semantic | Producer side | Consumer side | Enforced by |
|---|---|---|---|
| At-most-once | acks=0 or no retries, a lost batch is simply lost | Commit the offset before processing | Nothing dedups; gaps are tolerated. Cheapest, lossy. |
| At-least-once | acks=all with retries (idempotence optional), a batch is never lost but may duplicate on retry | Commit the offset after processing | Replication (HW) for durability; redelivery on reprocessing. The default. Share groups are inherently at-least-once. |
| Exactly-once | Idempotent producer (PID+epoch+sequence) plus transactions binding writes and the offset commit atomically | Read at read_committed (LSO + aborted-txn filtering) | Leader sequence-dedup + 2-phase commit over __transaction_state + transaction markers. See Transactions & Exactly-Once. |
The enforcement points are spread across the stack. Idempotence lives at the partition leader, where ProducerStateManager deduplicates retried batches by (PID, epoch, sequence) (The Log Storage Engine). Atomicity lives in the transaction coordinator's two-phase commit and the markers it fans out to every partition (Transactions & Exactly-Once). Visibility lives in the consumer: the leader caps a read_committed fetch at the LSO, and the client skips batches whose PID is in the aborted-transaction index (The Consumer Client). Kafka Streams packages all of this as KIP-447 exactly_once_v2 (Kafka Streams Architecture).
"Exactly-once" is exactly-once processing within a Kafka-to-Kafka pipeline, not magical de-duplication of arbitrary side effects. It holds only when the read, the derived writes, and the input-offset commit all sit inside one transaction and downstream consumers read at read_committed.
Ordering guarantees
All ordering in Kafka is per-partition; there is no global order across a topic. Within a partition the log is totally ordered by offset, and that order is preserved end to end provided two conditions hold:
- On the produce side, order is preserved automatically when the idempotent producer is on (the broker enforces strictly increasing per-partition sequences, so any of
max.in.flight.requests.per.connectionin [1,5] is safe). Without idempotence, onlymax.in.flight=1preserves order, because a retried batch could otherwise overtake a later one (The Producer Client). - On the broker side, the leader assigns offsets under a single lock so appends are serialised, and per-connection request ordering is guaranteed by muting a channel while its request is in flight (Network Layer & Threading).
Consumers see records in offset order within each partition but interleave partitions arbitrarily. Keyed records preserve relative order only because the default partitioner sends a given key to a fixed partition. Share groups deliberately give up ordering: records are leased individually to whichever consumer is free, so there is no cross-consumer order at all (Share Groups).
Everything is a replicated log
Data partitions are not special. The same machinery, an append-only log, replication to followers, a high watermark gating visibility, and a deterministic replay that rebuilds in-memory state, backs five internal logs, each owned by the controller, a coordinator, or the tiered-storage remote-log metadata manager (RLMM).
| Internal topic | Holds | Replayed into | Chapter |
|---|---|---|---|
__cluster_metadata | All cluster state (topics, configs, ACLs, quotas, broker registrations, leader/ISR/ELR) | MetadataImage on every broker & controller | Controller, Metadata Propagation |
__consumer_offsets | Group membership + committed offsets (compacted) | GroupCoordinatorShard timeline state | Group Coordination |
__transaction_state | Per-transactional.id transaction metadata (compacted) | TransactionMetadata state machines | Transactions & Exactly-Once |
__share_group_state | Per-record delivery state & share-partition offsets (compacted) | ShareCoordinatorShard + SharePartition | Share Groups |
__remote_log_metadata | Tiered-storage segment metadata (50 partitions) | Default RLMM in-memory index | Tiered Storage |
The two data-plane logs (__cluster_metadata and ordinary topic partitions) replicate by followers pulling via Fetch; the coordinator logs are ordinary compacted topic partitions, so the leader that owns a partition is the coordinator for the keys hashed to it, and a coordinator failover is just a partition leader change plus a replay. This is the single most unifying idea in the codebase: there is essentially one durable-state mechanism, reused everywhere.
Reusing the log as the only persistence primitive means there is one replication path, one durability story (replicate-then-commit), one failover story (replay the partition), and one operational surface (it is just a topic). New stateful subsystems, share groups, KIP-848 groups, Streams groups, are added by writing new record types into a coordinator log, not by inventing new storage.
Epochs & fencing: a universal anti-zombie mechanism
Distributed systems must cope with zombies: an instance that was replaced (after a pause, crash, or network partition) but does not yet know it. Kafka's answer is uniform, a monotonically increasing epoch attached to every authority, checked on every operation, with the rule reject anything carrying an epoch lower than the current one.
| Epoch | Fences | Bumped on | Rejection |
|---|---|---|---|
| Producer epoch | A zombie producer for a transactional.id | InitProducerId; every commit/abort under TV_2 | InvalidProducerEpochException / PRODUCER_FENCED |
| Leader epoch | A stale leader's writes; guides follower truncation | Each leadership change (new PartitionChangeRecord) | Diverging-epoch response → follower truncates |
| Partition epoch | A stale AlterPartition ISR proposal | Every controller partition change | INVALID_UPDATE_VERSION |
| Coordinator epoch | A stale transaction marker / log append | Each __transaction_state leader change | TransactionCoordinatorFenced |
| Broker epoch | A stale broker incarnation | Each (re-)registration lease | Stale heartbeats/fetches ignored |
| Raft / quorum epoch | A deposed controller leader | Every election (+1) | NotController / vote rejected |
| Group / member epoch (KIP-848) | A zombie group member & its commits | Membership/assignment changes | STALE_MEMBER_EPOCH / FENCED_*_EPOCH |
For every epoch, the value is non-decreasing and a strictly higher epoch always supersedes a lower one. No operation tagged with a stale epoch is ever applied. This one rule is what makes failover safe across the entire system without distributed locks.
Timeline & snapshot data structures
Coordinators and the controller face a shared problem: they apply records to in-memory state optimistically (before those records are committed past the high watermark), yet a record that never commits, because leadership was lost or a metadata transaction aborted, must leave no visible trace. The solution is the timeline collection: a copy-on-write map/set/value that remembers its contents at each metadata offset and can be reverted to any earlier offset.
The controller's TimelineHashMap/TimelineObject family is backed by a SnapshotRegistry keyed by offset (The KRaft Controller); the group, transaction and share coordinators use the same SnapshotRegistry-backed timeline structures so their handlers need no locks and reads observe a consistent snapshot (Group Coordination). On renounce or transaction abort, the state is revertToSnapshot(lastStableOffset), discarding everything uncommitted, which is exactly why writes are applied optimistically yet failover is byte-identical.
The same shape recurs at the log level. ProducerStateManager snapshots producer state on every segment roll so it can be rebuilt after failover (The Log Storage Engine); the metadata log takes KIP-630 snapshots so a fresh node need not replay from offset 0 (KRaft Consensus); the share coordinator periodically writes a full ShareSnapshotValue to bound replay (Share Groups). In every case a snapshot is a compaction of a log into materialised state, closing the loop back to stream-table duality.
These six ideas are deliberately the threads that tie the guide together. When a new chapter introduces a subsystem, ask which log it persists to, which epoch fences it, and how its in-memory state is reverted on failover, the answers are almost always an instance of the patterns above.