II · 00 · The Operator's Mental Model
Source: Apache Kafka 4.4.0-SNAPSHOT (git 04bfe7d, 2026-06-15), KRaft mode. Operational guidance grounded in source code and cited benchmarks.
Before any runbook, any capacity formula, any tuning dial, you need a correct model of what a Kafka cluster is in KRaft 4.x, because nearly every production incident is a control loop that stopped converging, and you cannot diagnose a feedback loop you cannot name. This chapter assembles that model: the two kinds of node and what process.roles decides; the metadata plane (the controller quorum) versus the data plane (brokers) and the hard line of ownership between them; the five self-correcting control loops, broker heartbeats & fencing, ISR maintenance, high-watermark advancement, metadata propagation, group rebalancing, each presented as something you observe rather than command; the five internal topics you implicitly operate and exactly what breaks when each one is unhealthy; the four SLIs that actually predict user pain and how to turn them into SLOs; and a one-screen map of "what can go wrong, where" that the remaining twelve chapters of Part II expand into detail. Every default and metric name here is pinned to the source file that defines it.
What you actually run: nodes, roles, and one process class
A KRaft cluster is a set of JVM processes, every one of which is constructed by the same class, KafkaRaftServer (core/src/main/scala/kafka/server/KafkaRaftServer.scala:49). That single class reads one decisive config, process.roles (raft/src/main/java/org/apache/kafka/raft/KRaftConfigs.java:32; it has no default, you must set it), and from it instantiates an Option[BrokerServer], an Option[ControllerServer], or both (KafkaRaftServer.scala:76-90). The legal role names are exactly two, broker and controller, enumerated in server/src/main/java/org/apache/kafka/server/ProcessRole.java:21-22. There is no third "ZooKeeper" thing to run: ZooKeeper was removed in 4.0, and this build has no migration path back to it.
So the entire operational fleet decomposes into three deployment shapes of one process:
process.roles | What this node is | Runs (from KafkaRaftServer) | When to use |
|---|---|---|---|
broker | Data-plane node: holds partition replicas, serves produce/fetch. | BrokerServer only (:76) | Production fleets of any size; brokers scale horizontally and independently of the quorum. |
controller | Metadata-plane node: a voter in the Raft quorum that owns cluster metadata. | ControllerServer only (:82) | Dedicated controllers, the recommended shape for any cluster you care about; isolates metadata from data load. |
broker,controller | Combined ("co-located") node: both at once, sharing a JVM. | both BrokerServer + ControllerServer (:76,:82); detected by isKRaftCombinedMode (KafkaConfig.scala:180-181) | Dev, test, small/edge clusters. Avoid for large production: broker GC pauses and page-cache pressure can disrupt the controller in the same heap. |
Crucially, even a combined node holds exactly one SharedServer instance (core/src/main/scala/kafka/server/SharedServer.scala:97; its doc comment at :86-95 states this explicitly). The SharedServer owns the components that the broker and controller halves both need: the KafkaRaftManager (the Raft client for the metadata log), the MetadataLoader, and the SnapshotGenerator (SharedServer.scala:123-130). This matters operationally: in combined mode you are not running two independent servers that happen to share a host, you are running two role-players over one Raft client and one metadata pipeline. A fault in metadata loading is fatal for a process that plays the controller role and merely logged-and-incremented for a pure broker (SharedServer.scala:203-210), which is the first hint that the two planes have very different blast radii.
Run 3 controllers for a normal production cluster (tolerates 1 controller loss) or 5 for the largest/most critical (tolerates 2). The number must be odd, a Raft quorum needs a strict majority, so an even count gives no extra fault tolerance and wastes a node. Controllers do small, metadata-shaped work; you do not scale them with traffic. You scale brokers with traffic. Cross-link KRaft consensus for the quorum mechanics and Topologies for placement.
Two planes, one hard line of ownership
The single most useful abstraction for operating KRaft is the split between the metadata plane and the data plane. They run different code, fail differently, are sized differently, and own strictly disjoint state. Confuse them and you will reach for the wrong runbook.
ControllerServer voters. One is the active controller (leader); the rest are hot standbys replaying the same log. Owns the authoritative cluster state: topics, partitions, replica assignments, ISR membership, broker registrations & liveness, configs, ACLs, quotas, feature levels. State lives in the __cluster_metadata log (single partition) and is published to brokers as a stream of records + periodic snapshots.BrokerServer nodes. Each is a follower of the metadata log and a materializer of the slice of cluster state that concerns it. Owns the data: partition replicas on local disk (and tiered object storage), the high-watermark per partition, the request-handling reactor, the group/transaction/share coordinators. Talks to the controller only to register, heartbeat, and propose ISR changes.The boundary is enforced in code, not convention. Brokers reach the controller through a NodeToControllerChannelManager for forwarding and for proposing changes (BrokerServer.scala:254-264), and they keep a read-optimized KRaftMetadataCache built by replaying the metadata log (BrokerServer.scala:223). A broker never mutates topic/partition/ISR state directly; it asks. Two concrete asks define almost everything an operator watches:
- Broker → controller liveness: the broker's
BrokerLifecycleManagerregisters with the controller (per KIP-631) and sends periodic heartbeats (server/src/main/java/org/apache/kafka/server/BrokerLifecycleManager.java:64-67). The controller decides whether the broker is alive (unfenced) or dead (fenced). - Broker → controller ISR proposals: when a partition leader wants to shrink or grow its in-sync set, it does not act unilaterally; it sends an
AlterPartitionrequest via theAlterPartitionManager(BrokerServer.scala:305-310) and the controller commits the new ISR to the metadata log. This is why ISR changes show up as controller-driven, log-ordered events.
In the ZooKeeper era, cluster metadata lived in ZK and controller failover meant re-reading O(partitions) of state, Confluent measured ~28 s for 100k partitions on Kafka 1.0.0, dropping to ~14 s on 1.1.0, and it stayed structurally linear, which is the real reason ZK capped clusters near ~200k partitions (empirical; Confluent "200K Partitions" post). KRaft replaces that with a replicated log the standby controllers already hold in memory, so a new leader is current the instant it wins election. The lab demonstration, Confluent's 2,000,000-partition cluster, ~10× the ZK ceiling, is the headline, but the operational point is the mechanism: failover no longer reloads state (empirical; Confluent KRaft docs / KIP-500). Do not over-read "millions of partitions" as a per-broker SLA, per-broker counts are still bounded by file descriptors, fetcher threads, and Linux vm.max_map_count (default 65,530, ~2 mmaps/partition ⇒ ~32k partitions/broker until raised) (empirical; Instaclustr). See Limits and Partitioning.
The five control loops you observe (not command)
Kafka is not a request/response database you drive imperatively; it is a bundle of negative-feedback control loops, each continuously measuring an error and correcting toward a setpoint. Health is "all loops converging." Every page-worthy incident is a loop that has stopped converging, diverging, oscillating, or stuck. Learn the loops and the metrics fall out of them, because each loop's "error signal" is a metric you already have.
① Broker heartbeat & fencing, the liveness loop
Each broker sends a BrokerHeartbeat to the active controller every broker.heartbeat.interval.ms = 2000 ms (KRaftConfigs.java:40). The controller grants a lease that expires after broker.session.timeout.ms = 9000 ms (KRaftConfigs.java:44), its doc string literally reads "the length of time… that a broker lease lasts if no heartbeats are made." Miss heartbeats for the session window and the controller fences the broker: it stops being a valid leader/ISR member, and its leaderships migrate elsewhere. The broker's own state machine tracks this, NOT_RUNNING(0) → STARTING(1) → RECOVERY(2) → RUNNING(3) → PENDING_CONTROLLED_SHUTDOWN(6) → SHUTTING_DOWN(7) (metadata/src/main/java/org/apache/kafka/metadata/BrokerState.java:47-80), and a freshly registered broker stays fenced until it has caught up on the metadata log and the controller transitions it from RECOVERY to RUNNING (the heartbeat-response handler logs exactly that on unfence, BrokerLifecycleManager.java:665-666). The loop's error signal: heartbeats not arriving. Operator-visible as the controller-side TimedOutBrokerHeartbeatCount and the FencedBrokerCount / ActiveBrokerCount gauges (controller/.../metrics/ControllerMetadataMetrics.java:49,51; QuorumControllerMetrics.java:62).
The 4.5× ratio between session timeout and heartbeat interval tolerates two consecutive missed heartbeats plus jitter before fencing, long enough to ride out a GC pause or a brief network blip, short enough that a genuinely dead broker is evicted in under ~10 s. This is the same design logic as a consumer's session.timeout.ms ≈ 3 × heartbeat.interval.ms. Do not set them equal; a single dropped packet would then fence a healthy broker. The interval also bounds how fast a fenced broker re-joins, because the broker re-heartbeats aggressively (it schedules the next heartbeat ~10 ms out after a fence-relevant response, BrokerLifecycleManager.java:660).
② ISR maintenance, the membership loop
For each partition, the leader maintains the in-sync replica set (ISR): the replicas that have fetched up to (close to) the leader's log-end offset recently. A follower is dropped from the ISR if it hasn't caught up within replica.lag.time.max.ms = 30000 ms (server/src/main/java/org/apache/kafka/server/config/ReplicationConfigs.java:55); when it catches up, it is re-added. Shrinks and expansions are committed through the controller (loop ①'s liveness gates who is even eligible). The error signal is the gap between the replica set and the ISR, surfaced as UnderReplicatedPartitions (core/src/main/scala/kafka/server/ReplicaManager.scala:94,241) and as the rates IsrShrinksPerSec / IsrExpandsPerSec (:100,101). In steady state both rates are zero; sustained shrinking with no matching expansion means a follower (or its disk/network) is falling behind, or a broker is down. Cross-link Replication.
③ High-watermark advancement, the durability loop
The high-watermark (HW) is the offset up to which a record is replicated to the whole current ISR and therefore safe to expose to consumers. It advances to the minimum log-end offset across the ISR. This is the loop that makes "committed" mean something: a producer with acks=all is told its write succeeded only once the HW (gated by min.insync.replicas) covers it. When the ISR shrinks, the HW advances against fewer replicas, faster, but less durable; that tradeoff is the entire reason min.insync.replicas exists, and why its default of 1 (ServerLogConfigs.java:155) is unsafe for durable topics. The operator levers here are covered in depth in Durability; the loop's health is visible as UnderMinIsrPartitionCount and AtMinIsrPartitionCount (ReplicaManager.scala:95,96,242,243).
replication.factor=3 + min.insync.replicas=2 + producer acks=all makes any single broker failure non-data-losing: a write is acknowledged only after ≥2 replicas hold it, so the surviving replica always has every committed record. The compiled-in defaults do not give you this, default.replication.factor is 1 (ReplicationConfigs.java:42), min.insync.replicas is 1 (ServerLogConfigs.java:155), and num.partitions is 1 (ServerLogConfigs.java:36). You must set the triad explicitly. And note the silent trap: min.insync.replicas is capped by the actual replica count, so a topic with RF=1 still accepts writes even when the cluster default is 2, verify min.insync.replicas ≤ replication.factor per topic (empirical; Conduktor).
④ Metadata propagation, the truth-distribution loop
Once the controller commits a change (a new leader, an ISR update, a created topic), every broker must learn it. Brokers run a MetadataLoader that replays the committed metadata log and applies it to the local KRaftMetadataCache (SharedServer.scala:326-334, BrokerServer.scala:223). To keep replay bounded, the controller periodically snapshots: a new snapshot is generated after metadata.log.max.record.bytes.between.snapshots = 20 MiB or metadata.log.max.snapshot.interval.ms = 1 hour, whichever first (raft/src/main/java/org/apache/kafka/raft/MetadataLogConfig.java:41,39), and the active controller writes a no-op record every metadata.max.idle.interval.ms = 500 ms so the committed offset keeps moving even when idle (:76). The error signal is metadata lag: how far a broker's applied offset trails the committed log. A broker that lags propagation routes clients to stale leaders. Cross-link Metadata Propagation; controller-side progress is observable via LastAppliedRecordOffset / LastCommittedRecordOffset (QuorumControllerMetrics.java:54,56).
⑤ Group rebalancing, the consumption-matching loop
The group coordinator (a broker role, materialized over __consumer_offsets) keeps every partition of a subscribed topic assigned to exactly one consumer in a group. Members join, leave, time out, or fall behind; each change triggers a rebalance that recomputes the assignment. The setpoint is "each partition assigned once, no orphans, no double-consumption." The pathology is the rebalance storm: a member that exceeds max.poll.interval.ms (default 5 min) is evicted, rejoins, and triggers another rebalance, self-sustaining (empirical; Conduktor/AWS). This loop is unique in that the actors are clients you may not control; see Group Coordination and Failure Modes. The newer consumer-group protocol (KIP-848) moves assignment computation broker-side to damp these storms.
| Loop | Setpoint | Error signal (metric) | Key tunable / default · source | Detail chapter |
|---|---|---|---|---|
| ① Heartbeat / fencing | alive broker ⇒ unfenced | FencedBrokerCount, TimedOutBrokerHeartbeatCount | broker.session.timeout.ms=9000 · KRaftConfigs.java:44 | KRaft Controller |
| ② ISR maintenance | ISR = caught-up replicas | UnderReplicatedPartitions, IsrShrinksPerSec | replica.lag.time.max.ms=30000 · ReplicationConfigs.java:55 | Replication |
| ③ HW advancement | HW = min LEO over ISR | UnderMinIsrPartitionCount, AtMinIsrPartitionCount | min.insync.replicas=1 · ServerLogConfigs.java:155 | Durability |
| ④ Metadata propagation | broker cache = committed log | metadata lag; LastAppliedRecordOffset | snapshot @ 20 MiB / 1 h · MetadataLogConfig.java:41,39 | Metadata Propagation |
| ⑤ Group rebalancing | each partition assigned once | rebalance rate; consumer lag | max.poll.interval.ms=300000 (client) | Group Coordination |
The five internal topics you implicitly operate
You did not create them and you rarely name them, but five internal topics carry the cluster's nervous system. Four are ordinary (replicated, broker-hosted) topics whose names are hardcoded in clients/src/main/java/org/apache/kafka/common/internals/Topic.java:27-30 plus the tiered-storage metadata topic in storage/.../TopicBasedRemoteLogMetadataManagerConfig.java:44; one, __cluster_metadata, is special and is not a normal topic at all but the Raft metadata log itself. Know each one's defaults and, more importantly, its failure blast radius.
| Internal topic | Owner plane | Default partitions · RF · min ISR | Source | If it is unhealthy… |
|---|---|---|---|---|
__cluster_metadata | metadata (Raft log, not a managed topic) | 1 partition (fixed); replicated by the controller quorum | Topic.java:30-34; partition 0 is CLUSTER_METADATA_TOPIC_PARTITION | Cluster-wide outage of the control plane. No leader elections, no topic creation, no config/ISR changes commit. Existing partitions keep serving from cache, but the cluster cannot react to any change. This is the one topic whose loss is catastrophic. |
__consumer_offsets | data (group coordinator) | 50 · 3 · (inherits broker min.insync.replicas) | GroupCoordinatorConfig.java:115,124 | Consumer groups cannot commit or load offsets. On coordinator loss, affected groups stall and rebalance; committed positions for a lost partition may be unreadable, risking re-processing or skipped messages. Producers/consumers of user topics keep flowing until they need to commit. |
__transaction_state | data (transaction coordinator) | 50 · 3 · 2 | TransactionLogConfig.java:32,40,45 | EOS/transactional producers cannot begin, commit, or abort. In-flight transactions hang; with hung transactions the last stable offset stalls and read_committed consumers block on that partition (see PartitionsWithLateTransactionsCount). Non-transactional traffic is unaffected. |
__share_group_state | data (share coordinator, KIP-932) | 50 · 3 | ShareCoordinatorConfig.java:38,42 | Share groups (queue-style consumption) cannot persist per-record acknowledgement state. Share-group delivery stalls; classic consumer groups and producers are unaffected. Only relevant if you use share groups. |
__remote_log_metadata | data (tiered storage, KIP-405) | 50 · 3 · 2 | TopicBasedRemoteLogMetadataManagerConfig.java:54,56,57 | Tiered-storage metadata (which segments are in object storage) cannot be read/written. Fetches that must reach remote segments fail or stall; local (hot) data still serves. Only present when tiered storage is enabled. |
All four managed internal topics default to RF 3, and the transaction and remote-log-metadata topics default to min ISR 2 (sources above). On a cluster with fewer than 3 brokers the offsets/transaction topics will be created with whatever RF is available, silently weakening durability of the very state that coordinates everyone else. The classic production incident: a 3-broker cluster created __consumer_offsets at RF=3, one broker is lost long-term, an offsets partition goes under-min-ISR, and every group whose coordinator maps to it stalls, an outage that has nothing to do with user-topic configuration. Treat the internal topics as first-class: verify their RF and ISR after any cluster resize, and keep ≥3 brokers before relying on RF=3. Cross-link Metrics & Signals.
The four SLIs that predict user pain
You can collect hundreds of Kafka metrics; only four service-level indicators directly track what a producer or consumer experiences. Everything else is a cause or a leading indicator of one of these. Define your SLOs on these four and let the rest be diagnostics.
| SLI | What the user feels | Primary signals (source) | Example SLO (set yours to workload) |
|---|---|---|---|
| Availability (write/read acceptance) | "Can I produce/consume right now?" | OfflinePartitionsCount=0 (ControllerMetadataMetrics.java:62); UnderMinIsrPartitionCount=0 (ReplicaManager.scala:95); ActiveControllerCount sums to 1 (QuorumControllerMetrics.java:46) | 99.95% of produce requests to RF≥3 topics succeed (non-throttled) per 30-day window. |
| Durability (no acknowledged-write loss) | "Did my committed data survive a failure?" | UncleanLeaderElectionsPerSec=0 (any non-zero = loss); UnderMinIsrPartitionCount; durability triad in force | 0 unclean leader elections; 100% of acks=all writes retained across single-broker loss. |
| End-to-end latency (produce→consume) | "How fresh is the data I read?" | broker RequestMetrics TotalTimeMs p99 (produce/fetch); client produce/fetch latency; e2e probe | p99 produce ack < 50 ms; p99 e2e (produce→consume) < 200 ms for the hot path. |
| Consumer lag (backlog age) | "How far behind is my pipeline?" | client records-lag-max; external trend monitor (Burrow / exporter); alert on trend, not a static count | Lag for tier-1 groups drains within 5 min of a spike; sustained growth over 15 min pages. |
(1) Latency and lag are distributions and trends, never single thresholds. Alert latency on p99 over a window; alert lag on sustained growth (recovery time ≈ current lag ÷ net drain rate), because a healthy consumer that briefly spikes is fine and a dead consumer reports no lag at all, only an external monitor catches the stalled group (empirical; LinkedIn Burrow). (2) Availability and durability are near-binary, page on >0. OfflinePartitionsCount, UnderMinIsrPartitionCount, and UncleanLeaderElectionsPerSec should be zero in a healthy cluster; any sustained non-zero is user-visible (empirical; Confluent/Datadog/AWS MSK). (3) Aggregate controller gauges with SUM. ActiveControllerCount is 0 or 1 per node; the cluster is healthy iff the sum is exactly 1, sum 0 means no controller (critical), sum >1 means split-brain. Datadog's rule: alert on any value lasting longer than one second (empirical; Datadog).
One screen: what can go wrong, where
This is the map the rest of Part II fills in. Read it as a triage tree: a symptom at the top, the plane and loop it implicates, and the chapter that holds the runbook. The discipline is to localize to a plane and a loop first, control-plane problems and data-plane problems demand different responses, and within the data plane a durability problem and a latency problem have different fixes.
| Symptom | Plane · loop | Likely root cause | Where it's expanded |
|---|---|---|---|
ActiveControllerCount sum = 0 | control ·, | Quorum lost majority; controllers down or partitioned. | Failure Modes, KRaft Consensus |
| Metadata lag climbing on a broker | control→data · ④ | Broker can't keep up replaying the metadata log; GC, disk, or network on that node. | Metadata Propagation |
OfflinePartitionsCount > 0 | data · ②③ | Partition has no leader: all replicas down, or sole in-sync replica lost. | Failure Modes |
UnderReplicatedPartitions > 0, all brokers up | data · ② | Slow follower, disk I/O, network, or GC on one node; not a broker-down event. | Replication, Performance Tuning |
UnderMinIsrPartitionCount > 0 | data · ③ | ISR fell below min.insync.replicas; acks=all producers get NotEnoughReplicas. | Durability |
UncleanLeaderElectionsPerSec > 0 | data · ③ | An out-of-sync replica was elected leader (only if you enabled it) → committed data truncated. | Durability |
| Disk approaching full; broker halts | data · storage | Retention not keeping up with ingest; a full log dir crashes the broker without graceful shutdown. | Capacity Planning, Storage Management |
| Handler/network idle% < 30% | data ·, | Request-thread or reactor saturation; under-provisioned or hot-spotted broker. | Performance Tuning, Network & Threading |
| Consumer lag growing; one partition hot | data · ⑤ | Skewed key, slow consumer processing, or rebalance storm. | Partitioning, Group Coordination |
PartitionsWithLateTransactionsCount > 0; read_committed stalled | data ·, | Hung transaction pins the last stable offset. | Transactions & EOS, Failure Modes |
A steady, constant UnderReplicatedPartitions with one broker absent is almost always "broker down", fix the node. A fluctuating URP while all brokers are up is a performance problem, a follower repeatedly drops out and rejoins because its disk, NIC, or GC can't keep pace with the leader's write rate (empirical; oneuptime). Same metric, two root causes, two completely different runbooks. Always correlate URP with broker liveness (loop ①) before acting, and watch IsrShrinksPerSec/IsrExpandsPerSec, a shrink with no matching expand is the fingerprint of a follower that fell behind and stayed behind.
Operator actions vs. emergent behaviour
A final framing that prevents a lot of grief: distinguish the handful of things an operator commands from the vast majority that the cluster does on its own. You issue a small set of imperative actions; the control loops above produce everything else as emergent behaviour. Reaching for an imperative action when a loop is mid-correction is how operators turn incidents into outages.
| You command (imperative) | Mechanism / config · source | The loops then do (emergent) |
|---|---|---|
| Start / format a node | process.roles, node.id, controller.quorum.bootstrap.servers; metadata dir metadata.log.dir (MetadataLogConfig.java:34) | Registration (KIP-631), catch-up on metadata, unfence, leadership assignment. |
| Graceful shutdown for maintenance | controlled.shutdown.enable=true (ServerConfigs.java:97); broker enters PENDING_CONTROLLED_SHUTDOWN | Controller migrates leaderships off the broker before it stops, so no partition goes leaderless. Why you rolling-restart one broker at a time. |
| Create / configure a topic | Admin API → controller commits to __cluster_metadata | Replica placement, leader election, metadata propagation to all brokers (loop ④), then clients discover it. |
| Reassign partitions / rebalance data | kafka-reassign-partitions.sh → controller AlterPartitionReassignments | New replicas catch up (loop ②), join ISR, old replicas drop; throttle to protect the durability/latency SLIs. |
| Change quorum membership | Dynamic quorum (controller.quorum.bootstrap.servers) vs static controller.quorum.voters (QuorumConfig.java:58,67) | Raft adds/removes a voter and re-replicates; majority must stay live throughout. |
With controlled.shutdown.enable=true (the default), a broker asked to stop first transitions to PENDING_CONTROLLED_SHUTDOWN (BrokerState.java:70) and the controller moves its partition leaderships to other in-sync replicas before the process exits. That is why a planned single-broker restart causes no leaderless partitions while an unplanned crash does (the controller only learns it's gone after the 9 s session timeout). The operational consequence is concrete: restart brokers one at a time, wait for UnderReplicatedPartitions to return to 0 between each, and never restart two replicas of the same partition inside one replica.lag.time.max.ms window, the durability triad only protects you if at least min.insync.replicas replicas are continuously up. See Lifecycle.
Where to go from here
This chapter is the index to Part II. The model, two planes, five loops, five internal topics, four SLIs, one failure map, is the lens every later chapter assumes. From here: Configuration (the dials and how they're applied), Limits (hard constants vs soft configs vs emergent bounds), Partitioning and Capacity Planning (sizing the data plane), Performance Tuning and Durability (the throughput/latency/durability triangle), Failure Modes and Metrics & Signals (the runbooks and the signals that trigger them), Topologies and Cost (placement and the cloud bill), and Scaling Scenarios and Lifecycle (what changes as you grow and how to operate the cluster over time). For the mechanisms beneath the loops, the Part I chapters are cross-linked throughout, start with Overview and KRaft Consensus.
KIPs referenced
- KIP-500 Replaced ZooKeeper with the self-managed KRaft metadata quorum, the foundation of the metadata-plane / data-plane split in this build.
- KIP-631 The KRaft controller's broker-lifecycle protocol: registration, heartbeats, fencing, control loop ① (
BrokerLifecycleManager.java:64-67). - KIP-405 Tiered storage; introduces the
__remote_log_metadatainternal topic (TopicBasedRemoteLogMetadataManagerConfig.java:44). - KIP-848 The new (server-side) consumer group protocol, damps the rebalance-storm pathology of control loop ⑤.
- KIP-932 Share groups (queue-style consumption); introduces
__share_group_state(ShareCoordinatorConfig.java:38).