Apache Kafka Internals

A deep, source-derived guide to Apache Kafka in three parts: how it works inside, how to operate it at scale, and what the distributed log teaches us as an architectural blueprint.

Apache Kafka 4.4.0-SNAPSHOT · git 04bfe7d · 2026-06-15 Derived from source · not copied from official docs 3 parts · 46 chapters

New here? Start with the Architecture Overview. Operating a cluster? Jump to Part II, Operations. Designing a system? See Part III, The Log as a Blueprint.

Part I · Architecture Internals

How Kafka actually works inside, from the on-disk byte layout of a record batch up to the KRaft controller quorum, the coordinators, and the client runtimes.

Getting Started

Architecture Overview

The distributed commit log, broker & cluster anatomy, and the end-to-end data path.

Data Format

Record Format & Batches

The v2 record batch on disk and on the wire: varints, headers, CRC, control records, compression.

Wire Protocol & RPC

Request/response framing, ApiKeys, the schema generator, flexible versions, tagged fields.

Storage Engine

The Log Storage Engine

UnifiedLog, LogSegment, the offset/time/transaction indexes, append & read paths, recovery.

Retention & Compaction

LogManager, retention by time/size, the log cleaner, compaction, tombstones, JBOD.

Tiered Storage

KIP-405 remote log: RemoteLogManager, the metadata topic, RemoteIndexCache, copy/read paths.

Networking & API

Network & Threading Model

SocketServer, the Acceptor/Processor reactor, the handler pool, Selector, purgatory & timing wheels.

Request Processing

KafkaApis dispatch, the request lifecycle, validation, authorization & throttling.

Replication & Cluster

Replication, ISR & High Watermark

ReplicaManager, Partition, the ISR, the high watermark, leader epochs, acks, unclean election, ELR.

Fetch Path & Replica Fetchers

Follower replication, AbstractFetcherThread, truncation, incremental fetch sessions, DelayedFetch.

KRaft & Metadata

KRaft Consensus (Raft)

KafkaRaftClient, the quorum state machine, elections, pull-based replication, snapshots, voters.

The KRaft Controller

QuorumController, the control managers, the single-threaded event loop, timeline data structures.

Metadata & Broker Lifecycle

MetadataImage/Delta, the loader & publishers, KRaftMetadataCache, registration, heartbeats, fencing.

Coordination

Group Coordination

The group coordinator, classic & KIP-848 rebalance protocols, assignors, offset management.

Transactions & Exactly-Once

Idempotent producer, the transaction coordinator, markers, two-phase commit, read_committed & LSO.

Share Groups (Queues)

KIP-932 queues: share consumers, the share coordinator, acquisition locks, delivery counts, DLQ.

Clients

The Producer Client

KafkaProducer, the RecordAccumulator, BufferPool, the Sender thread, partitioning, idempotence.

The Consumer Client

Classic vs async consumer, SubscriptionState, the fetch pipeline, request managers, the poll loop.

Cross-Cutting

Security

SASL/SSL/OAuth/SCRAM/Kerberos, delegation tokens, the Authorizer, KRaft ACLs, principal building.

Quotas & Throttling

ClientQuotaManager, token buckets, quota entities, throttle responses, client metrics (KIP-714).

Streams & Connect

Kafka Streams

Topologies, tasks, StreamThread, state stores & changelogs, the partition assignor, EOS.

Kafka Connect

Workers, connectors & tasks, the distributed herder, backing stores, MirrorMaker 2.

Reference

Glossary & Concepts

Cross-cutting terminology and a quick reference of Kafka's core abstractions.

Part II · Operations Manual

How to run it: limits, tuning, capacity & partition sizing, failure runbooks, the signals to watch, cost, and what changes at 1M / 10M / 100M events per second.

Foundations

II·00

The Operator's Mental Model

What you run in KRaft, what fails independently, the control loops, and the SLIs/SLOs of a healthy cluster.

II·01

Configuration: The Tuning Surface

Static vs dynamic vs per-topic configs, precedence, and the knobs that actually matter, with interactions.

II·02

Limits & Boundaries

Hard, soft, and emergent limits: partitions, request/message size, the 1 GiB/s-per-topic question, FDs, 2 GB ceilings.

Sizing & Performance

II·03

Partitioning Strategy

How many partitions, per-partition ceilings, the real cost of partitions, when to reshard, the repartitioning trap.

II·04

Capacity Planning & Sizing

The throughput, disk, memory, and network formulas; replication amplification; estimating broker count.

II·05

Performance Tuning

Producer/consumer/broker knobs, the end-to-end latency budget, page cache and zero-copy; throughput ⇄ latency.

II·06

Durability, Availability & Consistency

acks, min.insync.replicas, RF, unclean election, the replication-not-fsync philosophy, ELR.

Running in Production

II·07

Failure Modes & The Runbook

URP, offline partitions, disk/quorum failure, ISR thrash, rebalance storms, hanging txns, cause→symptom→fix.

II·08

Metrics, Signals & Observability

The golden signals, where each is emitted in source, alert thresholds, and a dashboard blueprint.

II·09

Topologies & Deployment

Rack/multi-AZ, dedicated vs combined KRaft, multi-region replication, tiered-storage topology, tenancy.

Scale & Economics

II·10

Cost Engineering

Storage/network/cross-AZ cost drivers and the levers: compression, fetch-from-follower, tiered storage, RF.

II·11

Scaling: 1M → 10M → 100M / sec

Worked capacity tiers, the bottleneck that emerges at each, what to watch, and where Kafka's limits bind.

II·12

Lifecycle Operations

Rolling upgrades & metadata.version, reassignment & throttling, add/remove brokers, disaster recovery.

Advanced Operations

II·13

Multitenancy & Isolation

What is shared vs isolated in a broker, and how different tenants raise each other's latency & error rates, the noisy-neighbour problem.

II·14

Proactive Monitoring: Leading Indicators

The leading indicators, trends and capacity-runway metrics that warn you a cluster will suffer, with lead time, before the lagging alerts fire. Includes client-team-side signals.

Part III · The Log as a Blueprint

Kafka as one implementation of the distributed-log pattern, when to choose it, its inherent tradeoffs, the reusable engineering tactics, and the design space.

The Pattern

III·00

The Distributed Log as a Pattern

The pattern abstracted from Kafka, its invariants and the universal problems a replicated log solves.

III·01

When to Use the Log, and When Not To

A decision framework: the forces for and against, and the anti-patterns (log-as-DB / -as-RPC / -as-queue).

Design & Tradeoffs

III·02

Design Decisions & Alternatives

Each Kafka choice as a tradeoff: pull vs push, ISR vs quorum (and why both), page cache vs managed memory, segments vs LSM.

III·03