krivaltsevich.com Kafka Internals4.4

II · 13 · Multitenancy & Isolation: What is Shared, What is Not

Source: Apache Kafka 4.4.0-SNAPSHOT (git 04bfe7d, 2026-06-15), KRaft mode. Operational guidance grounded in source code and cited benchmarks.

An honest answer to "how isolated are tenants on one Kafka cluster?" begins with a hard architectural fact: a broker is a shared-everything process. The thread pools that handle requests, the queue requests sit in, the OS page cache that accelerates every read, the disk, the NIC, the JVM heap, the controller, and the coordinator logs are all pooled across every producer, consumer, topic, and tenant on that broker. Two clients that never touch the same topic still contend for these resources, and one can, routinely does, raise the other's latency and error rate. That is the noisy-neighbour problem, and Kafka does not make it go away; it gives you throttles (quotas) and fences (ACLs, listeners, connection limits), not partitions of capacity. This chapter walks the broker resource by resource: for each, is it isolated or shared, by what mechanism, what interference symptom it produces, and which metric in II · 08 reveals it. It grounds every claim in the source, the request-handler pool in KafkaRequestHandler.scala, the bounded queue in RequestChannel.scala, the quota machinery in ClientQuotaManager.java, then lays out the isolation spectrum from a shared cluster with quotas (cheap, soft, leaky) to dedicated clusters per tenant (full, expensive), with four concrete noisy-neighbour runbooks and a recommendation per tenancy model.

The central truth: a broker is shared-everything

Walk a single produce request through a broker and count what it touches. It arrives on a TCP connection accepted by an Acceptor and read by a network Processor thread (one of num.network.threads, default 3, SocketServerConfigs.java:152). The Processor parses it and calls requestChannel.sendRequest, which does requestQueue.put(request) into a single bounded ArrayBlockingQueue sized at queued.max.requests, default 500 (RequestChannel.scala:90,117-119; SocketServerConfigs.java:144). A request-handler (I/O) thread, one of num.io.threads, default 8 (ServerConfigs.java:51), dequeues it, appends to the partition's log (touching the page cache, the disk, and the JVM heap for buffers), and if acks=all parks it in purgatory until replica-fetcher threads on the followers (one set per broker, num.replica.fetchers, default 1, ReplicationConfigs.java:96) pull the data over the shared NIC. None of these, not a thread, not a queue slot, not a cache page, not a fetcher, is reserved for one tenant. They are first-come-first-served pools.

This is a deliberate design choice rooted in Kafka's performance model (Part I · 06, Part I · 03): a small fixed thread pool over an append-only log on the page cache, exploiting zero-copy sendfile for reads, is what makes one broker sustain hundreds of MB/s. Per-tenant thread pools, per-tenant cache partitions, or per-tenant queues would shatter that. So Kafka pools the work and bolts on rate limiting as the multitenancy story. The consequence an operator must internalize: quotas throttle a tenant after the shared work is already done, and several shared resources, page cache, queue head-of-line position, GC, are not governed by any quota at all.

Many tenants · many clients (different users, client-ids, topics)
Producer A (hot topic) · Consumer B (cold backfill) · Admin C (topic churn) · Stream D (rebalancing)
↓ all funnel into one shared broker process ↓
SHARED broker resources, pooled, first-come-first-served
no per-tenant reservation on any of these
network Processors (num.network.threads=3)RequestChannel queue (queued.max.requests=500)I/O handlers (num.io.threads=8)replica fetchers (num.replica.fetchers=1)JVM heap + GC
SHARED OS page cache + disk + NIC
one cache, one disk set, one NIC per broker, cold reads evict hot tails; replication shares the wire
↕ cluster-wide singletons ↕
SHARED controller (single active writer) + coordinators
QuorumController single event thread · __consumer_offsets / __transaction_state shards
Every tenant's traffic converges on the same pooled resources; isolation is the exception, not the rule.
tenants/clients   shared broker threads/queues   shared cache/disk/NIC   cluster-wide singletons   chip = a config-bounded pool
The one-sentence model

On a shared Kafka cluster, data is isolated, capacity is not. ACLs guarantee tenant A cannot read tenant B's bytes; nothing guarantees tenant A's traffic cannot slow down tenant B's. Quotas bound a tenant's rate, but they bound it after the request has consumed shared thread time and without protecting the page cache or queue ordering.

Part 1, What is NOT isolated (the interference paths)

This is the larger half. For each shared resource below: the mechanism in source, the interference path (how one tenant hurts another), and the metric that reveals it. The summary table follows the prose.

The request-handler (I/O) thread pool

All requests of all tenants are served by one pool. KafkaRequestHandlerPool (KafkaRequestHandler.scala:227) spins up num.io.threads identical KafkaRequestHandler threads, each running the same loop: val req = requestChannel.receiveRequest(300) then apis.handle(request, requestLocal) (KafkaRequestHandler.scala:108-184). There is no tenant dimension anywhere in that loop, a handler picks up whatever is next in the queue and processes it to completion. A request that is expensive (a fetch spanning thousands of partitions, a produce with a huge batch, a fetch that misses the page cache and blocks on a cold disk read) occupies its handler thread for the whole duration. While it does, that thread is unavailable to anyone else.

Interference path: tenant A issues slow/heavy requests → handler threads stay busy → the bounded request queue drains slower than it fills → every later request (including tenant B's tiny produce) waits longer in the queue. Metric: RequestHandlerAvgIdlePercent (kafka.server:type=KafkaRequestHandlerPool, registered at KafkaRequestHandler.scala:210,253) falls toward 0; RequestQueueTimeMs in the per-API RequestMetrics breakdown rises for all request types. The idle gauge is the single best saturation signal for this resource (keep it >0.3; <0.2 = add capacity; <0.1 = active problem, Instaclustr/community heuristic, empirical, see II · 08).

Why a quota does not fully fix this

The request quota (request_percentage) caps a tenant's share of thread time, but it does so by measuring time the request already spent and then delaying the reply (ClientRequestQuotaManager.java:78-87 records request.requestThreadTimeNanos() after the fact). A single pathological request still occupies a handler for its full local time before any throttle is computed, the quota cannot pre-empt work in flight. It limits sustained rate, not a one-shot head-of-line stall.

The network Processor pool and the bounded RequestChannel queue

The queue between network and I/O is one ArrayBlockingQueue[BaseRequest] of fixed capacity queued.max.requests (default 500) shared by the entire data plane (RequestChannel.scala:90). Crucially, sendRequest calls requestQueue.put(...), the blocking form (RequestChannel.scala:117-119). When the queue is full, the network Processor thread that called put blocks until a slot frees. The queue doc states this plainly: "The number of queued requests allowed for data-plane, before blocking the network threads" (SocketServerConfigs.java:145).

Interference path: tenant A floods the broker (a runaway producer, a thundering herd of consumers) → handlers can't keep up → the 500-slot queue fills → network Processors block on put → those Processors stop reading any connection they own, so reads stall for every other tenant whose connection is multiplexed onto the same Processor. The backpressure is global. Metric: RequestQueueSize gauge (RequestChannel.scala:94) pinned near 500; NetworkProcessorAvgIdlePercent (kafka.network:type=SocketServer) toward 0; rising RequestQueueTimeMs across all APIs.

The OS page cache, the classic noisy neighbour

This is the interference path with no quota at all, and the one that surprises operators most. Kafka does not maintain its own large read cache; it relies on the OS page cache, and reads are served zero-copy. A consumer fetch ultimately calls FileRecords.writeTo, which does destChannel.transferFrom(channel, position, count) (FileRecords.java:291-302), a sendfile that copies bytes straight from the page cache to the socket without a userspace round-trip (the read path is dissected in Part I · 09). For a consumer reading the live tail, the data it wants is the data the producer just wrote, still resident in cache: the fast path. But the page cache is a single LRU shared by every partition on the broker, and it is finite.

Interference path: tenant A starts a large historical/cold read (a backfill, a new consumer group resetting to earliest, a re-processing job) → the broker pages in gigabytes of old segments → the LRU evicts the hot tail of tenant B's partitions → tenant B's previously-cached consumer now misses cache and the broker must read from disk → tenant B's RemoteTimeMs/fetch latency spikes and the broker's disk goes to 100% busy. Empirical reports call this the "catch-up tax": historical reads have driven p99 produce latency from ~2 ms to ~250 ms and dropped producer throughput ~43% on the same cluster (KIP-405 testing / azguards, empirical). Metric: there is no direct "cache eviction" metric, you infer it from a simultaneous (a) drop in RequestHandlerAvgIdlePercent, (b) rise in LocalTimeMs/RemoteTimeMs on fetches, (c) disk-utilization at the OS level near 100%, and (d) a new high-throughput consumer in BytesOutPerSec reading old offsets. Tiered storage (Part I · 05) partly mitigates by serving cold data from object storage rather than evicting the local cache.

shared page cache, hot tail of B's partitions resident
broker pages in GBs of old data via OS read-ahead
B's hot tail evicted → B's consumer now misses cache
disk read instead of sendfile → fetch latency ↑
disk 100% busy · RemoteTimeMs ↑ · idle% ↓ (op08)
Cold-read cache eviction: A never touches B's topic, yet knocks B's consumer off the zero-copy path.
page cache / disk   broker work   the harm to tenant B   data flow   what to watch

Disk I/O, the NIC, and the replica-fetcher threads

One broker has one set of disks and (typically) one NIC, shared by: every producer write, every consumer read, the log cleaner, cold historical reads, and all replication. There is no I/O scheduler inside Kafka that fair-shares bytes per tenant. The replica-fetcher pool is likewise shared: num.replica.fetchers (default 1) threads on each broker pull from all leaders it follows. The config doc is explicit that fetch parallelism "on each broker is bound by num.replica.fetchers multiplied by the number of brokers" (ReplicationConfigs.java:98). A single hot partition that a fetcher is busy replicating delays the replication of the other partitions assigned to that same fetcher thread.

Interference path: tenant A's high-write partition saturates a fetcher → tenant B's partitions on the same fetcher fall behind → B's followers cross replica.lag.time.max.ms (default 30000 ms, ReplicationConfigs.java:55) and drop out of ISR → B sees under-replication, and if its ISR falls below min.insync.replicas with acks=all, B's producers get NotEnoughReplicas and fail. Metric: UnderReplicatedPartitions, IsrShrinksPerSec (ReplicaManager.scala), and the per-partition fetcher lag (Part I · 08). Replication throttles (leader.replication.throttled.rate / follower.replication.throttled.rate, QuotaConfig.java:75-80) exist but are aimed at reassignment traffic, not steady-state tenant fairness.

The controller, a single cluster-wide writer

There is exactly one active controller, and it is single-threaded by design: "The QuorumController is single-threaded. A single event handler thread performs most operations" (QuorumController.java:164). Every metadata mutation, topic create/delete, partition reassignment, config change, ACL change, is an event on one KafkaEventQueue processed serially (Part I · 11). The controller is a cluster-wide singleton; there is no per-tenant controller.

Interference path: tenant A churns topics (a CI pipeline creating and deleting thousands of short-lived topics, or a misconfigured client repeatedly altering dynamic configs) → the controller event queue backs up → every tenant's metadata operation (including B's legitimate topic create or partition reassignment) waits behind A's. Metric: EventQueueTimeMs and EventQueueProcessingTimeMs on the active controller (QuorumControllerMetrics, II · 08) rise. Mitigation: the only per-tenant control here is the controller-mutation quota (below), which throttles create/delete/alter rates per principal.

The coordinators, shared internal logs

Consumer groups and transactions are coordinated through two internal topics, __consumer_offsets and __transaction_state, that are shared by all tenants. A group is mapped to a coordinator partition by hashing its id: return Utils.abs(groupId.hashCode()) % numPartitions (GroupCoordinatorService.java:428, mapping to Topic.GROUP_METADATA_TOPIC_NAME at line 417). The broker that leads that partition is the group's coordinator, and it runs the group's rebalances and persists its offset commits to that shared log (Part I · 13, Part I · 14).

Interference path: tenant A's group hashes to coordinator partition 17; so does tenant B's. A's group enters a rebalance storm (members flapping, exceeding max.poll.interval.ms) or commits offsets at an extreme rate → the coordinator shard for partition 17 is loaded with A's rebalance/commit work → B's group on the same shard sees slower joins and commits. Metric: elevated LocalTimeMs on OffsetCommit/JoinGroup/SyncGroup requests routed to that broker, and group-level rebalance metrics. Because group→partition is a hash you do not control, two unrelated tenants can collide on one shard.

The JVM heap and garbage collection

One JVM, one heap, one garbage collector per broker, shared by all tenants' in-flight buffers, request objects, and metadata structures. A tenant that drives heavy allocation (very large requests, high connection churn, many partitions) raises heap pressure and GC frequency. A GC pause is a stop-the-world event for the whole broker, every thread freezes, regardless of tenant.

Interference path: tenant A's allocation pressure → a long GC pause → the broker stops sending fetch responses and heartbeats → followers miss fetches and the broker's own session/heartbeat timers fire → ISR drops and consumer session timeouts across all tenants on that broker. Empirically, a multi-second Full GC causes ISR drops and timeouts; even a ~500 ms pause can trigger rebalances (Conduktor/Confluent, empirical). Metric: GC pause time (JVM metrics), correlated with broker-wide IsrShrinksPerSec spikes and consumer records-lag-max jumps. Mitigation: small heap (~6 GB) + G1GC with a low pause target (-XX:MaxGCPauseMillis=20), leaving the rest of RAM to the page cache (empirical, see II · 05).

The log cleaner, connections, file descriptors and memory-maps

The log-cleaner threads (compaction) are a shared pool working across all compacted topics; one tenant's heavily-churned compacted topic competes for cleaner time with another's. Connections, file descriptors, and memory-maps are per-broker OS resources shared by everyone: every partition replica consumes file descriptors and (on Linux) memory-map areas, which is what produces the per-broker partition ceiling discussed in II · 02 (the vm.max_map_count bound, ~2 mmaps/partition, default 65,530 → ~32,765 partitions/broker unless raised, Instaclustr, empirical). One tenant creating many partitions consumes a shared budget that bounds everyone's partition headroom on that broker.

ResourceShared or isolated?Isolation mechanism (if any)Interference symptom & metric (op08)Mitigation
I/O handler pool (num.io.threads=8)SHAREDrequest quota throttles rate after the facthandlers busy → queue backs up; RequestHandlerAvgIdlePercent→0, RequestQueueTimeMs↑ for allrequest_percentage quota; raise num.io.threads; isolate heavy tenant
Network Processors (num.network.threads=3) + RequestChannel queue (queued.max.requests=500)SHAREDconnection quotas bound fan-in; queue itself is globalqueue full → Processors block on put → reads stall for all; RequestQueueSize≈500, NetworkProcessorAvgIdlePercent→0byte-rate + connection quotas; raise threads/queue; separate listeners
OS page cacheSHARED, no quotanone (tiered storage partly avoids cold local reads)cold read evicts hot tail → others miss cache; disk 100%, RemoteTimeMs↑, idle%↓ (inferred)tiered storage (KIP-405); fetch-from-follower; size RAM for working set; dedicated cluster
Disk I/O & NICSHAREDnone for tenant fairness (replication throttle is for reassignment)I/O saturation → broad latency rise; LogFlushRateAndTimeMs↑, OS disk/net utilbyte-rate quotas; faster disks/NIC; dedicated brokers
Replica fetchers (num.replica.fetchers=1)SHAREDnone per-tenanthot partition delays others' replication; UnderReplicatedPartitions↑, IsrShrinksPerSecraise num.replica.fetchers; spread leaders; dedicated cluster
Controller (single event thread)SHARED singletoncontroller-mutation quota (per principal)topic/config churn → metadata ops slow for all; EventQueueTimeMscontroller_mutation_rate quota; rate-limit admin clients
Coordinators (__consumer_offsets, __transaction_state)SHARED logs (hashed shards)none, group→partition is hash % Nrebalance/commit storm loads a shard others share; LocalTimeMs on Join/Commit↑fix client rebalance behaviour; cooperative assignor; dedicated cluster
JVM heap & GCSHAREDnoneGC pause freezes broker → ISR drops, session timeouts for all; GC pause timesmall heap + G1GC low-pause; limit request sizes/connections
Log cleaner / FDs / mmapsSHARED OS budgetnoneper-broker partition ceiling; create failures, cleaner backlograise ulimits/vm.max_map_count; cap partitions per tenant (op02)
Partition data (logs)ISOLATEDseparate files per partition + ACLs(n/a, no data sharing),

Part 2, What IS isolated (or partially)

Now the smaller, but operationally vital, half. These are the mechanisms you actually configure to build a multitenant cluster.

(a) Data and storage, physically separate, ACL-gated

This is the one resource that is genuinely isolated. Each partition is its own log: its own directory of .log/.index/.timeindex segment files on disk, with its own append cursor, its own UnifiedLog object, its own ISR state (Part I · 03, Part I · 04). Reads and writes to different partitions do not share data structures or locks at the log level. And access is gated by ACLs (Part I · 18): an authorizer checks every request's principal against the topic's ACLs, so tenant A, even though its requests run on the same threads and its bytes may live in the same page cache, cannot read or write tenant B's topic without an explicit grant. Confidentiality and integrity of data are isolated; only performance leaks.

Invariant: ACLs isolate data, never capacity

An ACL is a yes/no on an operation against a resource. It says nothing about how much CPU, cache, or I/O the allowed operation consumes. A tenant with a valid ACL, comfortably under every quota, can still evict your cache and saturate your disk. Never reason about isolation as if ACLs implied resource isolation.

(b) Quotas, the primary multitenancy mechanism

Quotas are Kafka's main tool for taming noisy neighbours, implemented by ClientQuotaManager and its subclasses. There are four kinds:

QuotaConfig key (entity override)What it limitsDefaultManager / mechanism (source)
Produce rateproducer_byte_ratebytes/sec a client may produceunlimited (Long.MAX_VALUE)ClientQuotaManager (QuotaType.PRODUCE); QuotaConfig.java:87-89
Fetch rateconsumer_byte_ratebytes/sec a client may consumeunlimitedClientQuotaManager (QuotaType.FETCH); QuotaConfig.java:90
Request raterequest_percentage% of (I/O + network) thread timeunlimited (Integer.MAX_VALUE as double)ClientRequestQuotaManager; QuotaConfig.java:91,129-131
Controller mutationcontroller_mutation_ratecreate/delete partition mutations/secunlimitedControllerMutationQuotaManager (TokenBucket); QuotaConfig.java:92

Each is keyed by entity, with a precedence chain documented in source (ClientQuotaManager.java:209-219): the most specific match wins, in order <user, client-id><user><client-id>, with <default> fallbacks at each level. There is also a per-IP connection-rate quota (connection_creation_rate, QuotaConfig.java:93). The covered tour of the full quota machinery is Part I · 19; here we focus on what it does and does not isolate.

How enforcement works, and why timing matters. A byte-rate quota records the bytes, and if the moving-average rate exceeds the bound it computes a throttle time and delays the response while muting the channel so the client stops sending. Concretely: recordAndGetThrottleTimeMs records the value and on QuotaViolationException returns a throttle delay (ClientQuotaManager.java:333-346); throttle(...) then records the throttle, builds a ThrottledChannel, and adds it to a DelayQueue (ClientQuotaManager.java:392-409); a ThrottledChannelReaper thread polls the queue and calls notifyThrottlingDone() when the delay elapses (ClientQuotaManager.java:285-294), at which point the SocketServer unmutes the channel (the ChannelMuteEvent path at SocketServer.scala:948-951). The rate window is configurable: quota.window.size.seconds default 1 s over quota.window.num default 11 samples (QuotaConfig.java:42,52).

The crucial caveats, quotas are leaky isolation

(1) Quotas are rate limits applied after the work is done on shared threads. The byte-rate quota records bytes after the produce/fetch was handled; the request quota records thread-nanos after the request ran (ClientRequestQuotaManager.java:83 uses request.requestThreadTimeNanos(), past tense). The throttle delays the reply to slow the client's next request; the handler time for this one is already spent. A burst is absorbed by the shared pool first, then the client is slowed.
(2) Quotas do not isolate page-cache eviction. A consumer under its consumer_byte_rate can still read cold data and evict everyone's hot tail, the quota bounds throughput, not which bytes are touched.
(3) Quotas do not isolate head-of-line position in the shared queue. A throttled tenant's already-queued requests still sit in the same 500-slot queue ahead of others.
(4) A tenant comfortably under quota can still cause interference, via cache, disk seeks, GC, or a coordinator-shard collision, because those are not what the quota measures.

The controller-mutation quota deserves a note: it is the one strict quota. It uses a TokenBucket and, when exhausted, rejects the operation with THROTTLING_QUOTA_EXCEEDED rather than merely delaying it, "it does not accept any mutations once the quota is exhausted until it gets back to the defined rate" (ControllerMutationQuotaManager.java:86-117,160-172). This is what protects the single-threaded controller from a topic-churn tenant.

Tenant clientI/O handlerQuotaManagerChannel
Produce (already over rate)
append to log, work DONE on shared thread
recordAndGetThrottleTimeMs(bytes)
throttleTimeMs > 0
ThrottledChannel → DelayQueue, mute
response + throttleTimeMs (delayed reply)
Throttling happens after the handler did the work; the quota delays the reply and mutes the channel to slow the next request.
client   broker thread/channel   quota manager   request   reply/return

(c) Listeners, separating request classes onto distinct ports

Kafka separates traffic onto named listeners, each with its own network Processor pool: the config doc notes "each listener (except for controller listener) creates its own thread pool" (SocketServerConfigs.java:153). Putting inter-broker replication on one listener, client traffic on another, and the KRaft controller's control plane on a dedicated controller listener means a flood on the client listener does not consume the inter-broker listener's Processors. This isolates some request classes at the network layer, notably it protects replication and the control plane from data-plane saturation (Part I · 06). It does not isolate the shared I/O handler pool downstream, nor the page cache or disk.

(d) Connection quotas, bounding fan-in

You can cap connections to protect file descriptors and the accept path. max.connections (broker-wide) and max.connections.per.ip both default to Integer.MAX_VALUE (effectively unlimited, SocketServerConfigs.java:111,116), with per-IP overrides via max.connections.per.ip.overrides. Enforcement is in ConnectionQuotas.inc: it waits for a broker-wide slot, applies the per-IP rate throttle, increments counters, and throws TooManyConnectionsException when the per-IP cap is hit (SocketServer.scala:1295-1309). A notable design detail: the inter-broker listener is protected, "Connections on the inter-broker listener are permitted even if broker-wide limit is reached. The least recently used connection on another listener will be closed in this case" (SocketServerConfigs.java:122-123), so a client-connection flood cannot starve replication of connection slots.

(e) Dedicated clusters, the only full isolation

If you need a tenant's noise to be physically impossible to feel by another, the answer is separate brokers and separate controllers, a dedicated cluster. This is the only option that isolates the page cache, disk, NIC, JVM, controller, and coordinators, because none of them are shared anymore. It is also the most expensive: every cluster carries a fixed cost floor (a minimum of 3 brokers + a controller quorum for availability) plus the cross-AZ replication and storage costs covered in II · 10. For a 100 MBps cluster, networking alone can run ~$24k/mo and storage ~$14.5k/mo (Confluent, empirical), so multiplying clusters multiplies that floor.

The isolation spectrum

Multitenancy is a spectrum from cheap-and-leaky to expensive-and-airtight. Pick the point that matches each tenant's blast-radius tolerance.

Shared cluster + quotas + ACLs
Shared cluster + careful placement / resource groups
Dedicated cluster per tenant
Left = cheap, soft, noisy-neighbour risk. Right = full isolation, full cost.
soft isolation (rate limits)   placement-based   physical isolation   increasing isolation & cost
ModelIsolation strengthWhat it isolatesWhat still leaksCost
Shared + quotas + ACLsSoftdata (ACLs); sustained rate (quotas)page cache, queue head-of-line, GC, coordinator shards, burst latencyLowest, one cluster
Shared + placement (pin tenants to broker subsets; separate listeners; per-tenant topic→broker mapping)Mediumabove + broker-level blast radius for some resourcescontroller (still one); cross-broker cache/disk if leaders co-locateLow–medium, operational complexity
Dedicated clusterFulleverything, separate brokers + controllers + coordinatorsnothing within Kafka (only the cloud account / network fabric)Highest, per-cluster floor × N

Four noisy-neighbour runbooks

Each scenario below names the interference, the source mechanism that allows it, the revealing metric (cross-linked to II · 08), and the response.

Scenario 1, Cold-read cache eviction

Symptom: a stable cluster suddenly sees consumer fetch p99 spike for several tenants at once, disk utilization hits 100%, and producer latency creeps up, with no broker down and no obvious produce-rate change. Mechanism: a new or reset consumer (tenant A) is reading old offsets; the page cache (shared, FileRecords.writeTo zero-copy path, FileRecords.java:291-302) evicts the hot tail of other tenants' partitions; their fetches fall off sendfile onto disk reads. Reveal: correlate BytesOutPerSec showing a high-throughput reader on old offsets, falling RequestHandlerAvgIdlePercent, rising fetch RemoteTimeMs, and OS-level disk busy ~100%. Response: throttle the offender with a consumer_byte_rate quota to slow the eviction rate; longer term, enable tiered storage (Part I · 05) so cold reads come from object storage instead of evicting local cache, and size broker RAM for the true working set. If recurrent and SLA-critical, move backfill/analytics consumers to a dedicated cluster.

Scenario 2, Produce-flood queue saturation

Symptom: RequestQueueTimeMs rises for all request types on a broker; NetworkProcessorAvgIdlePercent and RequestHandlerAvgIdlePercent both fall; some clients see timeouts. Mechanism: tenant A floods produce; handlers can't drain the single 500-slot RequestChannel queue (RequestChannel.scala:90); network Processors block on the blocking put (RequestChannel.scala:117-119), stalling reads for every connection they multiplex. Reveal: RequestQueueSize pinned near 500; idle gauges toward 0; BytesInPerSec dominated by one client-id. Response: apply a producer_byte_rate and/or request_percentage quota to the offending principal so the broker delays its replies and mutes its channel (ClientQuotaManager.java:392-409); apply max.connections.per.ip if it is also a connection flood; raise num.io.threads/num.network.threads if the broker is simply under-provisioned (II · 05). Put high-volume tenants on a separate listener so their floods do not block other tenants' Processors.

Scenario 3, Topic-churn controller load

Symptom: metadata operations (topic create, partition reassignment, config change) become slow for everyone; EventQueueTimeMs on the active controller climbs. Mechanism: tenant A's pipeline creates/deletes thousands of topics or hammers dynamic configs; the single-threaded QuorumController (QuorumController.java:164) processes every mutation serially, so A's flood delays B's legitimate operations. Reveal: EventQueueTimeMs/EventQueueProcessingTimeMs rising (II · 08); high CreateTopics/DeleteTopics rate from one principal. Response: set a controller_mutation_rate quota on the offending principal, this is the strict, rejecting quota (ControllerMutationQuotaManager.java:86-117) that returns THROTTLING_QUOTA_EXCEEDED rather than queuing the work, directly protecting the controller. Fix the client to reuse topics instead of churning them.

Scenario 4, Rebalance-storm coordinator load

Symptom: consumer groups belonging to multiple tenants experience slow joins and offset commits, concentrated on one broker. Mechanism: tenant A's group enters a rebalance storm (members exceeding max.poll.interval.ms, flapping in and out); A's group and B's group both hash to the same __consumer_offsets partition (Utils.abs(groupId.hashCode()) % numPartitions, GroupCoordinatorService.java:428), so they share a coordinator shard on the same broker; A's churn loads it. Reveal: elevated LocalTimeMs on JoinGroup/SyncGroup/OffsetCommit routed to that broker; group-level rebalance-rate metrics for A spiking. Response: you cannot reassign the hash, so fix the storm at its source, adopt the cooperative-sticky assignor and static membership, size max.poll.interval.ms to the worst-case batch, reduce max.poll.records (II · 07, mechanism in Part I · 13). For a tenant whose grouping behaviour is chronically pathological and SLA-sensitive neighbours share its shards, a dedicated cluster is the clean separation.

Recommendations by tenancy model

Choose isolation by blast-radius tolerance, not by default

Internal / cooperative tenants, latency-tolerant (dev, batch, analytics): shared cluster + ACLs + byte-rate and request quotas. Cheapest, and the soft isolation is acceptable because tenants are not adversarial and can tolerate the occasional cache/queue interference. Set conservative default quotas so a runaway client is capped before it hurts neighbours.

Mixed-criticality on one cluster (some latency-sensitive, some bursty): shared cluster + the full quota set (add controller_mutation_rate for any tenant that touches topics, connection_creation_rate/max.connections.per.ip for connection-heavy ones) + separate listeners for inter-broker and high-volume client traffic + careful leader placement so latency-sensitive tenants' partitions do not co-locate with known cache-evicting workloads. Watch the per-broker saturation gauges and the controller event-queue metrics as first-class alerts.

Adversarial, regulated, or hard-SLA tenants: dedicated cluster. This is the only model that isolates the page cache, disk, NIC, JVM, controller, and coordinators. Accept the per-cluster cost floor (≥3 brokers + controller quorum, plus the cross-AZ/storage costs of II · 10) as the price of a guarantee no quota can give.

Do not mistake quotas for capacity reservation

Quotas are upper bounds on rate, enforced after shared work, on a subset of resources. They are excellent at stopping a runaway client from monopolizing throughput; they are useless against cache eviction, GC pauses, coordinator-shard collisions, and burst head-of-line latency. If your isolation requirement is "tenant B's p99 must be unaffected by anything tenant A does," no combination of quotas on a shared cluster meets it, only a dedicated cluster does. Size the spectrum honestly against that test.

The throughline: Kafka's shared-everything broker is what makes it fast and cheap to run at scale, and the same property is why multitenancy is fundamentally soft unless you spend on physical separation. Know exactly which of your resources are pooled, instrument the saturation and event-queue gauges that reveal contention early (II · 08), apply quotas as guardrails rather than guarantees, and reserve dedicated clusters for the tenants whose blast-radius tolerance is genuinely zero.

krivaltsevich.com · Part of Apache Kafka Internals · derived from Apache Kafka 4.4 source · GitHub · MIT-licensed.

Apache Kafka® is a registered trademark of the Apache Software Foundation. This is an independent, unofficial guide, not affiliated with or endorsed by the ASF.