krivaltsevich.com Kafka Internals4.4

III · 01 · When to Use the Log, and When Not To

Source: Apache Kafka 4.4.0-SNAPSHOT (git 04bfe7d, 2026-06-15), KRaft mode. Architectural analysis grounded in the source-verified Part I and cited comparative sources.

The distributed commit log is the most over-applied data structure of the last decade. It is also one of the most powerful, when the problem actually has the shape it solves. This chapter is a decision framework, not a sales pitch. bp00 defined the pattern (an append-only, totally-ordered-per-shard, replicated sequence of immutable records that is the system of record, with all queryable state derived as a replayable projection); here we draw the boundary around it. We name the forces that pull toward a log, sustained throughput, replay, fan-out, durable buffering, per-key ordering, decoupling, audit, and the forces that pull against it, request/response latency, per-message routing and priority, per-message expiry, multi-entity atomicity, tiny scale, and random access. Then we dissect the three canonical anti-patterns (Kafka-as-database, Kafka-as-RPC, Kafka-as-task-queue), each pinned to the concrete mechanism that makes it fail, and we are honest about what share groups (KIP-932) do and do not change. Throughout, the discipline is the same one Part II demanded of operators: separate what is inherent to the log from what is tunable by a config or mitigated by a feature, because the wrong fix for an inherent limit is the most expensive mistake an architect can make.

The shape of the problem the log solves

Before any force table, fix the mental image, because every "yes" and "no" below derives from it. A Kafka topic-partition is a single, ordered, immutable file-of-record-batches replicated across brokers (03, 08). Writers only ever append to the tail. Readers declare an offset and stream forward; the broker is a dumb, sequential pipe that hands back bytes with zero-copy sendfile and parks the request in a purgatory when there is nothing new to send (09). The broker keeps no per-message, per-consumer state on the classic path, a consumer group's entire progress is one committed offset per partition (13). Retention is a topic-wide sweep by time or size, not a per-record lifecycle (04). Everything good and everything bad about Kafka falls out of those five facts.

What the log IS
An ordered, immutable, append-only sequence per partition, replicated across failure domains. Sequential write, sequential read, zero-copy serve. Cheap to fan out (every reader has an independent cursor), cheap to replay (the data is still there until retention drops it), cheap to scale write throughput (append needs no coordination between shards). The order is the contract.
append-onlyper-partition orderoffset cursorszero-copy fan-outreplayable history
, it is deliberately NOT,
What the log is NOT
It is not an index (no lookup-by-key, no query). It is not a mutable store (no update-in-place, no delete-by-id, only append and topic-wide retention). It is not a router (no per-message addressing, content filtering, or priority). It is not a request/response channel (no correlation, no reply, no low-latency round-trip). It is not a cross-entity transaction engine for external systems. Each "not" is a direct consequence of the "is": you cannot index a pipe, route a sequence, or sub-millisecond a replicated append.
no random readno per-key updateno routing/priorityno per-message TTLno sync reply
The log's strengths and its absences are the same design decision viewed from two sides. The capabilities in green and the missing capabilities in red are not independent, they trade off against each other. A system that added indexing, routing, and per-message mutation to the log would no longer be a log; it would be a database with a queue bolted on, and would lose the throughput and fan-out that justified the log in the first place.
capability the log structure provides capability the log structure inherently lacks , the same architectural choice, two consequences
Design rationale, why "dumb broker, smart consumer"

Kafka deliberately pushed delivery state out of the broker and onto the consumer's offset. A "smart broker / dumb consumer" design (RabbitMQ) tracks per-message acknowledgement, redelivery, priority, and TTL inside the broker, rich semantics, but the broker now holds mutable per-message state that caps throughput and complicates replication. Kafka inverted it: the broker maintains almost nothing per message, so a partition is just a file you append to and stream from, and the OS page cache plus sendfile do the heavy lifting. The reference's comparison is blunt, Kafka peaked at ~605 MB/s in Confluent's OpenMessaging benchmark versus RabbitMQ's ~38 MB/s, which degrades past ~30 MB/s (bp06; Confluent OMB, vendor benchmark, directional). That throughput is bought with the absences in red above. You do not get to keep both halves.

The forces FOR the log

Seven forces favor a log. They are not independent, they reinforce, and the more of them a workload exhibits, the more decisively the log is the right substrate. Each traces to a concrete mechanism in Part I.

ForceWhy the log is strong hereMechanism (Part I)
High sustained throughputAppend is the cheapest possible write; partitions shard the write across brokers with no cross-shard coordination, so throughput scales horizontally. Reads are sequential and served zero-copy. A single partition absorbs tens of MB/s; a cluster reaches multi-GB/s.Append-only segment engine + per-partition independence (03); zero-copy fetch (09); partition sharding (op03)
Replay / reprocessBecause records are immutable and retained, any consumer can reset its offset and re-read history, to rebuild a corrupted projection, backfill a new index, or run a new model over old events. The data is not consumed-and-gone; reading does not mutate the log.Immutable segments + offset-addressed reads (03); offset is consumer-owned (17)
Multiple independent consumers (fan-out)Each consumer group has its own cursor; adding a tenth reader costs the leader nothing but extra sequential reads (often still from page cache). One write feeds analytics, search indexing, audit, and a cache-warmer simultaneously, the canonical "data integration backbone."Per-group offsets, no broker-side per-consumer message state (13); shared zero-copy reads (09)
Durable buffering & backpressure absorptionThe log is a shock absorber between a fast producer and a slow or bursty consumer. Retention is the buffer; a consumer that falls behind catches up later instead of dropping data or back-pressuring the producer. Spikes become lag, not loss.Disk-resident retention decoupled from consumption (04); pull-based consumers set their own pace (09)
Per-key orderingAll records with the same key land in the same partition and are delivered in append order. This gives a total order per entity (per user, per account, per device), exactly the guarantee state machines and changelogs need, without the cost of a single global order.Key → partition hashing (16); partition is a total order (03)
Event-driven decouplingProducers know nothing about consumers and vice versa. Teams ship and evolve independently; a new consumer joins by subscribing, with no change to the producer. The topic is the contract, the schema is the API. This is the architectural payoff, not a performance one.Publish/subscribe over a shared log; no producer-consumer handshake (bp00)
Auditable source of truthAn append-only, ordered, immutable record of what happened is a natural audit log and the basis of event sourcing: state is a fold over the events, and the events themselves are never mutated. Compliance, debugging, and reconstruction all read the same history.Immutability + ordering (03); compaction retains latest-per-key for changelogs (04)
Reusable tactic, count the forces, weight by replay and fan-out

When evaluating any substrate for an event pipeline, score the workload against these seven. Two forces are near-decisive because nothing else does them as cheaply: replay (immutable retained history you can re-read) and fan-out (N independent consumers from one write). If a workload needs both, the log wins almost regardless of the others. If it needs neither, a single consumer that processes each message once and never replays, you are likely holding a queue problem, and the forces-against table below probably applies. This force-counting is itself the transferable artifact: it works for Pulsar, Kinesis, or a home-grown log just as well as for Kafka.

The forces AGAINST the log

Six forces push the other way. Crucially, each is a structural consequence of the same five facts that produced the strengths, which is why most cannot be tuned away, only mitigated or worked around. The table marks, for each, whether the limit is inherent (you cannot remove it without ceasing to be a log), mitigated (a feature softens it but does not erase it), or tunable (a config trades it against something else).

Force againstThe mechanism that causes itInherent / mitigated / tunable
Low-latency request/responseA durable produce waits for replication to min.insync.replicas before it is acknowledged (08); consumers pull and the broker batches reads behind fetch.min.bytes/fetch.max.wait.ms (09). Both add latency floor. The log optimizes throughput over latency by construction.Inherent floor, partly tunable. You can push p99 e2e to single-digit ms with tuning (a tier-1 bank held sub-5 ms p99 at 1.6M msg/s; Confluent), but you cannot get below the replication + poll floor without giving up durability (acks=1), and you can never make it a synchronous round-trip.
Per-message routing / priority / selective consumeA partition is a sequence; a consumer reads it in order or not at all. The broker has no content filter, no priority lanes, no "give me only messages matching X." The classic consumer reads every record at its offset (17).Inherent; partly mitigated by share groups. Share groups (KIP-932) add per-record acquire/ack and out-of-order completion, but still no priority and no content routing. RabbitMQ-style routing remains absent by design (bp06).
Per-message TTL / expiryRetention is a whole-segment sweep by retention.ms/retention.bytes at the topic level (04). There is no per-record clock; record 5 and record 5,000,000 in a segment expire together when the segment ages out.Inherent. No config makes one record expire before its segment-mates. The only granular delete is a compaction tombstone on a keyed/compacted topic, and even that is eventual and per-key, not a TTL (04).
Multi-entity transactional updatesKafka transactions give atomicity for a read-process-write within Kafka (atomic writes across partitions + offset commit) via the coordinator and markers (14). They are not a 2PC across Kafka and an external database, and not a general multi-row update.Inherent boundary. Mitigated only at the edges: the outbox/CDC pattern gets you at-least-once + eventual consistency across DB and Kafka (bp04), never cross-system exactly-once.
Tiny scaleA 3-broker KRaft cluster, partitions, RF, ISR, offsets, monitoring, and the operational model (op00) are fixed costs whether you push 10 msg/s or 10M. The machinery does not shrink to a hobby workload.Inherent overhead, partly mitigated. Managed/serverless offerings amortize the ops cost; Kinesis/Pub-Sub remove it entirely for low volume (bp06). But the conceptual surface area is still large for a problem a single database table would solve.
Point queries / random accessOffsets are positions in a sequence, not keys in an index. "Fetch the current value for user 42" requires scanning, because the log has no key→offset map for arbitrary lookups (03). It is a write-optimized sequence, not a read-optimized index.Inherent. The pattern's own answer is to not query the log: materialize a projection into a store built for lookups (Streams state stores, a KV DB) and query that (bp00). The log feeds the index; it is not the index.
Where the log is simply the wrong choice

If your dominant access pattern is lookup-by-id with low latency, mutate-in-place, ad-hoc query, and per-row lifecycle, that is a database, and no amount of Kafka tuning will make a log into one. If your dominant pattern is synchronous request → compute → reply under tens of milliseconds, that is RPC/HTTP/gRPC, and routing it through a log re-introduces the coupling and latency you were trying to remove. If your pattern is a handful of events per minute that one service handles once, the operational surface of a cluster dwarfs the problem; use a managed queue or a DB-backed job table. The log is a specialist. Reach for it when the seven forces light up, not because it is the default streaming brand.

The decision tree

The framework collapses into a single traversal. Walk it top-down for any candidate workload; the first hard "no" routes you off the log. The tree encodes the central asymmetry: the forces against are mostly inherent, so a workload that trips one of the early gates will not be rescued by tuning, whereas a workload that clears them benefits more from the log the more of the later forces it accumulates.

candidate workloaddata in motion, events, commands, changes, telemetry
Primary need = lookup-by-key, update-in-place, or ad-hoc query?
use a databaselog has no index / no mutation / topic-wide retention. Optionally feed it FROM a log via CDC.
Need a synchronous reply within tens of ms?
use RPC / gRPC / HTTPreplication + poll floor + no correlation; a reply topic re-couples the services.
Do you need replay, OR fan-out to ≥ 2 independent consumers?
Need per-message ack / priority / TTL / routing?
Throughput high & sustained, OR per-key order required?
use a queue / brokerRabbitMQ for routing+priority+TTL; share groups (KIP-932) for ack+redelivery+DLQ but no priority/routing.
a managed queue may be simplerlog works, but weigh its fixed ops cost vs. Kinesis/Pub-Sub/SQS at tiny scale.
USE THE LOGKafka (or a log-shaped system). Materialize queryable views downstream; keep the log as source of truth.
The decision tree. Three early gates eliminate the misuses (store, RPC, classic queue); the workload reaches "use the log" only after clearing them and then exhibiting replay/fan-out and throughput/ordering. Note the two amber "maybe" leaves: a low-volume decoupling job runs fine on a log but may not justify its fixed cost, and a per-message-semantics job lands on share groups only if it can live without priority/routing (see the next sections). The dashed paths are the "it works but interrogate it" branches; solid paths are clean fits or clean misfits.
workload entry store/RPC gate queue/decoupling gate log fit / throughput gate route OFF the log primary decision path "works, but interrogate" path pill = recommended outcome

Anti-pattern 1, Kafka as a database

The single most common misuse. It is seductive because Kafka persists data, durably, checksummed, replicated, on disk, so it looks like a store, and a compacted topic that retains the latest value per key looks like a table. But "persists data" and "is a database" are different claims, and the gap is exactly the set of mechanisms a database has that a log does not.

You want a database for…The log's mechanism insteadConcrete failure mode
Read a row by primary key, fastOffsets address positions, not keys. There is no key→offset index for arbitrary lookups (03)."Get user 42's current address" forces a scan from offset 0 (or from the last compaction point). O(N) over the topic. There is no point read.
Update a field in placeThe log is append-only and immutable. "Update" = append a new record; old records remain (03).To know the current value you must fold the whole key's history. Two readers at different offsets see different "current" values, there is no single mutable cell.
Keep data until I delete itRetention is a topic-wide time/size sweep (04). On a non-compacted topic, old segments are deleted on a clock you set at the topic level.Retention silently deletes your "database." Set retention.ms=7d and your system of record is gone after a week. Operators have lost data this way by treating a delete-policy topic as permanent storage.
Run an ad-hoc / multi-key queryNo query engine. Compaction (cleanup.policy=compact) keeps the latest per key, but that is a retention rule, not an index or a WHERE clause (04)."Sum balances where region=EU" cannot run against the log. Compaction does not give you secondary indexes, joins, or aggregation, you would scan everything in the application.
Multi-row ACID transactionTransactions are atomic Kafka writes across partitions, not multi-key updates to a queryable dataset (14)."Debit A, credit B, atomically, then query the result" is not what Kafka transactions do; and a hanging transaction pins the LSO and makes all later records on the partition invisible to read_committed readers (op07).
Compaction is not a query engine, and not even a guarantee of one-row-per-key

The most sophisticated version of this anti-pattern is "use a compacted topic as a KV store." Compaction retains at least the last value per key, useful for a changelog you replay to rebuild state (Streams does exactly this), but the reference is explicit that it does not guarantee a single record per key at any instant (multiple values and tombstones can coexist until the cleaner runs; timing is non-deterministic). It gives you retention, not random access: you still read it sequentially to materialize a map. A compacted topic is a durable, replayable input to a KV store. It is not the KV store. The pattern's prescription is unambiguous, Kreps: "I don't think Kafka really benefits from trying to add any kind of random access lookups directly against the log"; Kafka replicates into specialized systems, it does not replace them.

Reusable tactic, log as source of truth, store as projection (CQRS)

The correct shape when you are tempted to query Kafka: keep the log as the write model / system of record, and materialize a read model into a store built for the query, a KV DB for point lookups, a search index for text, an OLAP store for aggregation, a Streams state store for stream joins (bp00, 20). The projection is derived, disposable, and rebuildable by replaying the log. You get the log's throughput/fan-out/audit and the database's query power, because you stopped asking one component to be both. This is the inside-out database, and it is the whole point of the pattern.

Anti-pattern 2, Kafka as RPC / request-reply

The second misuse routes a synchronous call through the log: service A writes a "request" record, service B consumes it, processes, and writes a "response" record to a reply topic that A consumes, correlating on an id. It works in a demo and rots in production, for reasons that are structural, not incidental.

  • Latency floor, twice. Each leg pays the durable-produce floor (replication to min.insync.replicas, 08) plus the consumer poll floor (fetch.max.wait.ms, 09). A round-trip is two legs, so the inherent latency is doubled, and the log is optimized for throughput, not the per-message latency RPC needs. The reference is blunt: Kafka is pull/poll-based, not push, which is part of why per-message latency exceeds push brokers at low load, a deliberate throughput-over-latency choice.
  • No correlation or addressing in the substrate. The log has no request id, no reply-to, no "this message is for caller X." You build correlation in the application and demultiplex responses yourself, re-implementing, badly, what an RPC framework gives you for free.
  • Re-coupling. The whole appeal of the log was decoupling producers from consumers. Request-reply re-introduces the synchronous dependency, A now blocks on B's liveness and latency. As the reference puts it, request-reply over Kafka "returns the coupling we were trying to avoid."
  • Reply-topic sprawl. Per-caller reply topics multiply partition count (a cost, op02) or you share a reply topic and every caller filters for its own correlation ids, scanning past everyone else's responses, because, again, the log cannot route.
Caller AService B
gRPC call(req)
RPC: one hop each way, correlated by the call itself, sub-ms to low-ms.
resp, same connection
LOG: produce req → request-topic (durable-ack floor)
B must POLL the request topic (fetch.max.wait floor) before it even sees the request.
produce resp → reply-topic; A POLLS & filters by correlation id
Two durable appends + two polls + app-level correlation = doubled floor and re-coupling.
Request-reply over RPC (top, teal/blue) versus over a log (bottom, red). The RPC path is one correlated round-trip on a live connection. The log path turns each direction into a durable append followed by a poll, adds application-managed correlation, and makes A depend on B's liveness anyway, paying the log's throughput-oriented costs to get none of its decoupling benefit. The red legs are the anti-pattern.
caller callee / service ec-data = native RPC hop response / return ec-err = log leg (the misuse) time flows top → bottom
The narrow case where it is defensible

Request-reply over Kafka is occasionally justified when the "reply" is not latency-sensitive and you specifically want the request and response persisted, auditable, and replayable, e.g., a long-running async job whose submission and result you want on the same durable backbone as everything else, with the log providing buffering and at-least-once retry. Even then, prefer an explicit async-job design (correlation id, idempotent handler, a result store you query) over pretending the log is a synchronous channel. If the caller is willing to block for an answer in real time, it is RPC; use RPC.

Anti-pattern 3, Kafka as a priority / task queue

The third misuse treats a topic as a work queue: producers enqueue tasks, a pool of workers dequeues and executes, with per-task acknowledgement, retry of just the failed task, priority lanes, and a visibility timeout. The classic consumer group cannot do this, and the reasons are mechanical.

Task-queue featureClassic consumer-group realityMechanism (Part I)
Ack/retry a single messageProgress is one committed offset per partition, not per message. You acknowledge a position, not a record. To "retry one task" you rewind the offset and re-process everything after it.Offset commit model (13, 17)
Visibility timeout (lease a task, auto-redeliver on crash)None. A consumer holds an offset range implicitly; if it dies mid-batch, redelivery happens only via rebalance + offset rewind, re-delivering the whole uncommitted range.No per-message lease on the classic path (17)
Priority lanesA partition is strict FIFO; there is no priority. You cannot make message B jump ahead of A in the same partition.Partition is a total order (03)
Parallelism beyond partition countAt most one consumer per partition per group. Want 200 workers? You need ≥ 200 partitions, with all the cost that implies.1:1 partition→consumer in a group (13, op03)
Isolate a slow/poison taskHead-of-line blocking. Because a partition is consumed in order by one member, one slow or failing record stalls every record behind it. The reference: "partition throughput collapses to the speed of the slowest message"; a poison pill stalls every later message.In-order single-consumer-per-partition (13)

What share groups KIP-932 change, and what they don't

This is where the chapter must be precise, because share groups are exactly the feature that converts some of this anti-pattern into a supported pattern, and conflating "some" with "all" is the new mistake. A share group lets many consumers cooperatively consume the same partition; the partition leader holds per-record in-flight state in a SharePartition, and each record moves through an explicit state machine: AVAILABLE → ACQUIRED → (ACKNOWLEDGED | back to AVAILABLE | ARCHIVED). What that buys, and what it still cannot do:

Classic task-queue gapShare groups KIP-932Verdict
Per-message ackSolved. A consumer ACCEPTs/RELEASEs/REJECTs individual records; state is per-record, persisted in __share_group_state (15).Fixed
Visibility timeoutSolved. The acquisition lock is a time-bounded lease (share.record.lock.duration.ms); on expiry the record returns to AVAILABLE for redelivery, exactly a visibility timeout (15).Fixed
Parallelism > partition countSolved. Many members consume one partition cooperatively; you are no longer capped at one consumer per partition (15).Fixed
Isolate a poison message (head-of-line)Largely solved. Records complete out of order, and one stuck record no longer blocks the rest; a bounded delivery.count.limit then archives it, optionally to a DLQ KIP-1191 for inspection (15).Fixed
Priority lanesNot solved. No priority within or across records; acquisition still walks the log forward. A high-priority task cannot jump the queue.Still absent
Content-based routing / selective consumeNot solved. No server-side filtering or routing; consumers still acquire whatever the share-partition hands out.Still absent
Exactly-once consumptionNot solved. Share groups are at-least-once; redelivery is expected. Consumers must be idempotent. EOS is a different mechanism (14).Still absent
Strict per-key orderGiven up by design. Cooperative out-of-order consumption means per-key/partition order is not preserved, the opposite trade from a classic consumer (15).Traded away
in topic log (durable record) AVAILABLE acquire, lease for share.record.lock.duration.ms, ++deliveryCount ACQUIRED ACCEPT, processed ACKNOWLEDGED (terminal)
ACQUIRED RELEASE / lock timeout · deliveryCount < limit AVAILABLE (redeliver) REJECT, or deliveryCount ≥ limit ARCHIVED / DLQ (KIP-1191)
The share-group per-record lifecycle (simplified from 15). This is what turns a topic into a usable work queue: a lease with a timeout (visibility timeout), per-record accept/release/reject (per-message ack), bounded redelivery, and a dead-letter destination for poison records. What it deliberately does NOT add: priority, routing, exactly-once, or order preservation. Match this state machine against your task-queue requirements line by line before adopting it, if you need priority or routing, a log (even with share groups) is still the wrong substrate.
in-flight (available / acquired) acknowledged, success terminal archived / DLQ, drop terminal transition (label = trigger / guard) = terminal state
Share groups close the queue gap, they do not make Kafka RabbitMQ

The honest summary: KIP-932 makes Kafka a competent at-least-once work queue with acks, visibility timeout, bounded retry, and a DLQ, genuinely closing the historical "Kafka can't do queues" critique for the common case. But three classic broker features remain structurally absent because they conflict with the log: priority (a partition is FIFO), content-based routing (the broker is a dumb pipe), and strict per-key order under sharing (cooperative consumption gives it up). If your task queue genuinely needs priority lanes or routing, that requirement does not trace to a missing Kafka feature, it traces to the log structure itself, and you want a routing broker (bp06). Note also that share groups are a preview-maturing feature (preview in 4.1, maturing through 4.2; full in this 4.4 tree); weigh that operational maturity for production-critical queues.

Inherent vs. tunable, the architect's litmus test

The through-line of this chapter, and of all of Part III, is the discipline of asking "is this limit structural or configurable?" before reacting to it. Misclassifying it is how teams either abandon Kafka over a limit they could have tuned, or waste a quarter trying to tune away a limit that is the log itself. The table below classifies the limits this chapter raised; carry the question, not just the answers.

Apparent limitClassWhat actually moves it
Per-message latency floorInherent floor, tunable marginTuning (batching off, fetch.min.bytes=1, fast disks/network, acks=1) shrinks it to single-digit ms; nothing makes a replicated append a synchronous round-trip (op05).
No per-message ack / visibility timeoutMitigated by featureShare groups (KIP-932) add both; classic consumer groups never will (15).
Head-of-line blocking on a partitionMitigated by featureShare groups consume out of order; classic groups block by design (15).
No priority / no content routingInherentNothing in Kafka, including share groups, a partition is FIFO and the broker does not filter. Use a routing broker (bp06).
No per-message TTLInherentRetention is topic-wide by time/size; tombstones on compacted topics are eventual per-key, not a TTL (04).
No random read / point queryInherentMaterialize a projection into a query store; the log feeds the index, it is not the index (bp00).
Retention deletes "stored" dataTunable / mitigatedretention.ms=-1 + compaction, or tiered storage KIP-405 for cheap long retention, but a non-compacted delete-policy topic is still not a database (05).
Cross-system exactly-onceInherent boundaryKafka EOS is within-Kafka read-process-write; outbox/CDC gets at-least-once + eventual across an external DB, never 2PC (14, bp04).
Fixed operational cost at tiny scaleMitigated by deploymentManaged/serverless amortizes ops; the conceptual surface remains. A small problem may want a small tool (bp06).
Global (cross-partition) orderingInherent (vs. throughput)Total order forces a single partition, sacrificing throughput; design per-key order instead (op03).
The litmus question, stated once

For any "Kafka can't do X" objection, ask three things in order: (1) Is X absent because of the log structure (append-only, per-partition order, offset-not-index, topic-wide retention, dumb broker)? If yes, it is inherent, design around it, do not fight it. (2) If not structural, is there a feature that supplies X at a stated cost (share groups for queue semantics, tiered storage for retention, transactions for atomic Kafka writes)? Then it is mitigated, adopt the feature and pay its price knowingly. (3) Otherwise it is a config trading X against throughput/latency/durability, it is tunable, so pick your point on the tradeoff. This three-step is the single most transferable habit in this part; it applies unchanged to any log-shaped system you evaluate.

Worked judgements

The framework is only useful if it produces clear calls on real workloads. A few, with the deciding force named:

Clickstream / telemetry ingest feeding analytics, search, and ML
Log, decisively. High sustained throughput, fan-out to ≥3 independent consumers, replay to backfill new models. All seven forces fire; zero gates trip. The textbook fit.
Order-state changes that must stay ordered per order and feed several services
Log. Per-key (per-order) ordering + fan-out + audit. Partition by order id; each order is a total order. Materialize current-order-state into a KV store for the lookup-by-id queries, do not query the log.
"What is account 42's balance right now?" serving a UI at p99 < 20 ms
Database (fed by the log). Point lookup + low latency = the store gate trips immediately. Keep balance-change events on a log, but serve reads from a materialized projection (bp00).
Synchronous "validate this payment, reply yes/no in 50 ms"
RPC/gRPC. The reply gate trips: doubled latency floor + no correlation + re-coupling. A reply topic is the anti-pattern.
Background job processing: thousands of independent tasks, per-task retry, isolate poison tasks, scale workers freely
Share groups if you can live without priority/routing and accept at-least-once; otherwise a routing broker. The queue gate trips, but KIP-932 now answers most of it, check the line-by-line table above against your needs.
Tasks with strict priority lanes and content-based routing to specific workers
RabbitMQ-style broker. Priority + routing are inherent absences in the log, share groups included. This requirement traces to the log structure, not a missing config (bp06).
A handful of webhook events per minute, one consumer, no replay
A managed queue or a DB job table. No replay, no fan-out, tiny scale, the fixed operational surface of a cluster is unjustified. The log works, but it is not the simplest correct tool.
Carry-away

The log is a specialist optimized for ordered, replayable, high-throughput, fanned-out streams that you decouple producers and consumers around. Its strengths and its absences are the same design decision. Reach for it when the seven forces light up; route off it the moment an early gate (store, RPC, classic-queue-with-priority/routing) trips, because those gates guard inherent limits no tuning will move. And when in doubt, apply the litmus question: structural, feature-mitigated, or config-tunable. Next, bp02 opens the design-decision space inside the "use the log" branch, partition count, keys, RF, ordering, and the rest, where the interesting tradeoffs begin.

krivaltsevich.com · Part of Apache Kafka Internals · derived from Apache Kafka 4.4 source · GitHub · MIT-licensed.

Apache Kafka® is a registered trademark of the Apache Software Foundation. This is an independent, unofficial guide, not affiliated with or endorsed by the ASF.