krivaltsevich.com Kafka Internals4.4

II · 01 · Configuration: The Tuning Surface

Source: Apache Kafka 4.4.0-SNAPSHOT (git 04bfe7d, 2026-06-15), KRaft mode. Operational guidance grounded in source code and cited benchmarks.

Configuration is the operator's control surface, the dials through which you steer durability, throughput, latency, and cost without recompiling a broker. But "configuration" in Kafka is not one flat namespace. It is four overlapping layers, static broker, dynamic cluster-wide, dynamic per-broker, per-topic, plus the client side, each resolved by a precise precedence, and each backed by a ConfigDef that hard-codes a type, a default, a validator, and an importance. This chapter teaches you to read a default straight from the source (so you never trust a stale blog), shows exactly which knobs apply live versus need a restart and why (the dynamic-update path through the metadata log), and walks the ~20 broker and topic knobs that actually move the needle, each with its default, its dynamic-ness, and the concrete mechanism it tunes. The cardinal rule throughout: every recommendation carries its WHY, rooted in a real mechanism in the code, not folklore.

The anatomy of a config: ConfigDef.define()

Every Kafka configuration, broker, topic, producer, consumer, is declared exactly once through a ConfigDef. A definition binds a key, a type, a default, an optional validator (range / enum / custom), an importance, and documentation. This is the single source of truth: when this manual cites a default, it is reading the literal argument passed to .define(...).

Take the replication ConfigDef in server/src/main/java/org/apache/kafka/server/config/ReplicationConfigs.java:154. Every column below comes from one .define() call:

Position in .define()Example, replica.lag.time.max.msOperational meaning
key"replica.lag.time.max.ms"The property name you set in server.properties or via AdminClient.
typeLONGParsed and coerced at load; a wrong type is a startup error.
default30000L (constant REPLICA_LAG_TIME_MAX_MS_DEFAULT)Injected when the key is absent.
validator(none here; e.g. atLeast(1) / between(0,1) elsewhere)Rejects out-of-range values before they take effect.
importanceHIGHDocumentation tiering only, no runtime effect.

Resolution at runtime is mechanically simple, and there is no hidden fallback chain at this layer. The AbstractConfig constructor calls definition.parse(originals) (clients/src/main/java/org/apache/kafka/common/config/AbstractConfig.java:118); parse() walks every defined key, and if the user supplied no value it injects key.defaultValue (clients/src/main/java/org/apache/kafka/common/config/ConfigDef.java:123). Thereafter AbstractConfig.get(key) simply returns whatever sits in the resolved values map and records the access for unused-config logging (AbstractConfig.java:174). The default you read in source is the default you get.

raw props (server.properties + CLI + dynamic)
preProcessParsedConfig → resolve ${provider:...} vars
key supplied?
validate against validator + type
inject defaultValue (ConfigDef.java:123)
resolved values map, read by get(key)
How one config value is resolved inside AbstractConfig. Validation runs on user-supplied values; absent keys fall to the compiled-in default. postProcessParsedConfig then fixes up "secondary defaults" before a second parse pass.
raw input · parse/validate · stored value · default-injection branch · get(key) records the read for unused-config warnings
How to read a default from source (do this, don't trust the blog)

Find the *Configs.java class for the subsystem, ServerLogConfigs, ReplicationConfigs, ServerConfigs, SocketServerConfigs, locate the _DEFAULT constant, then confirm it is wired into a CONFIG_DEF via .define(KEY, TYPE, KEY_DEFAULT...). Topic defaults live in the static CONFIG block of storage/src/main/java/org/apache/kafka/storage/internals/log/LogConfig.java. Client defaults are inline literals in ProducerConfig / ConsumerConfig. Never quote a default you have not seen in a .define() argument, several "well-known" values are now stale (the famous case: linger.ms is 5, not 0, since the producer ConfigDef sets it that way at ProducerConfig.java:418).

Five sentinel conventions recur when reading defaults, and misreading them causes real incidents:

  • NO_DEFAULT_VALUE (ConfigDef.java:91), a required config with no default; absence is a hard startup error (e.g. serializer classes).
  • Long.MAX_VALUE as default = "effectively disabled / unbounded." Examples: log.flush.interval.messages = Long.MAX_VALUE (never flush by count, ServerLogConfigs.java:101) and the byte-rate quotas (no quota until you set one).
  • -1 = "no limit / unbounded by this dimension." log.retention.bytes = -1 (no size cap, only time, ServerLogConfigs.java:81).
  • -2 = "derive from the parent config." local.retention.ms = -2 derives from retention.ms for tiered topics (LogConfig.java:146).
  • null default with a secondary key = a fallback synonym exists. log.retention.ms is null and falls back to log.retention.minutes then log.retention.hours (LogConfig.java:168–170); the synonym precedence is resolved in DynamicBrokerConfig.brokerConfigSynonyms (server/.../config/DynamicBrokerConfig.java:97).

The four layers and their precedence

A broker's effective configuration is assembled from four sources in a strict order. The authoritative resolution is DynamicBrokerConfig.updateCurrentConfig() (core/src/main/scala/kafka/server/DynamicBrokerConfig.scala:429): it starts from staticBrokerConfigs, overlays cluster-wide dynamic defaults, then overlays per-broker dynamic overrides, later layers win. For log/topic configs a fifth, most-specific layer applies on top: the per-topic override carried in each partition's LogConfig.

Static, server.properties + CLI
Read once at process start. Changing it requires a restart. This is the base layer, staticBrokerConfigs.
restartfile on disk
↓ overridden by
Dynamic cluster-wide, ConfigResource(BROKER, "")
Stored as ConfigRecords in the metadata log; applied to every broker live. Held in dynamicDefaultConfigs.
no restartmetadata log
↓ overridden by
Dynamic per-broker, ConfigResource(BROKER, "<id>")
Same record store, scoped to one broker id. Wins over the cluster-wide default. Held in dynamicBrokerConfigs.
no restartmetadata log
↓ for log/topic configs, further overridden by
Per-topic, ConfigResource(TOPIC, name)
LogConfig override per partition; the broker log.* value is merely the inherited default. Highest specificity for a topic's storage behaviour.
no restartper partition
Effective-config resolution order. Per-broker beats cluster-wide beats static; for storage configs a topic override beats the broker's log.* inheritance for that topic only.
broker/static · controller/metadata-stored dynamic · log/topic · chip = restart requirement & storage location

Two subtleties that bite operators:

  • The layering order flips with the scope you are writing. When you set a per-broker value, the resolver lays down cluster-defaults first then your per-broker override (per-broker wins). When you set a cluster-wide default, it lays the new defaults under any existing per-broker dynamic values (per-broker still wins), see validatedKafkaProps at DynamicBrokerConfig.scala:353–360. Net invariant: a per-broker dynamic override always shadows the cluster-wide default. If a cluster-wide change "didn't take" on one broker, suspect a leftover per-broker override.
  • Topic configs are synonyms of broker log.* configs. cleanup.policy (topic) ↔ log.cleanup.policy (broker); retention.mslog.retention.ms; segment.byteslog.segment.bytes; while min.insync.replicas carries the same name in both layers. The mapping is ServerTopicConfigSynonyms.TOPIC_CONFIG_SYNONYMS, surfaced through the "Server Default Property" header in LogConfig.java:117. The broker value is the cluster default; a topic that does not override inherits it.
KRaft changed where dynamic config lives

In KRaft, the only mode in 4.x; ZooKeeper was removed in 4.0, dynamic broker, topic, group, and client-quota configs are ConfigRecord entries in the metadata log, applied through the controller and replicated to every node. There is no ZooKeeper znode. On startup a broker even replays dynamic config straight from the latest metadata snapshot (readDynamicBrokerConfigsFromSnapshot, DynamicBrokerConfig.scala:92), then BrokerConfigHandler.processConfigChanges routes each record to updateDefaultConfig or updateBrokerConfig (core/src/main/scala/kafka/server/ConfigHandler.scala:149–153). See Part I 11 · KRaft Controller and 12 · Metadata Propagation for the replication path.

HARD vs SOFT vs EMERGENT limits

This manual distinguishes three kinds of bound, and configuration sits across all three. Knowing which kind you face tells you whether a number is a dial, a wall, or a consequence.

KindWhat it isHow you change itConfig-world example
HARDA compiled-in constant; no config touches it.Only by patching + rebuilding Kafka.MAX_IN_FLIGHT_REQUESTS_PER_CONNECTION_FOR_IDEMPOTENCE = 5 (ProducerConfig.java:277) caps in-flight requests when idempotence is on, regardless of your max.in.flight.
SOFTA config key with a default and a validator.Set the key (statically or, if dynamic, live).min.insync.replicas default 1; log.retention.ms default 7 days; num.io.threads default 8.
EMERGENTA ceiling that arises from resources/architecture, not a single key.Add hardware, change topology, or accept the bound.Effective ISR is capped by replica count, so min.insync.replicas=2 on an RF=1 topic silently behaves as 1 (see the gotcha below).

Dynamic vs restart-required: the update path

Whether a config is alterable live is not a matter of opinion, it is membership in a compiled set. DynamicBrokerConfig.ALL_DYNAMIC_CONFIGS (server/.../config/DynamicBrokerConfig.java:70) is the union of every subsystem's reconfigurable list; anything outside it is rejected for dynamic update by validateConfigs with "Cannot update these configs dynamically" (DynamicBrokerConfig.java:130–139). A second set, PER_BROKER_CONFIGS (line 62), records which of those must be scoped to a broker id; dynamicConfigUpdateModes() (line 195) maps each dynamic key to "per-broker" or "cluster-wide", exactly what kafka-configs.sh reports.

The mechanism that makes live reconfiguration safe is the two-phase apply in processReconfiguration (DynamicBrokerConfig.scala:447). First a validate pass builds a candidate KafkaConfig and calls validateReconfiguration on every affected reconfigurable; if any throws, the whole update is aborted and nothing changes. Only then does the apply pass swap in the new config and call reconfigure(oldConfig, newConfig). This is why a bad dynamic value is refused atomically rather than half-applied.

kafka-configs.shControllerMetadata logBroker(s)
IncrementalAlterConfigs
append ConfigRecord
replicate record
validateReconfiguration(), abort on throw
apply: reconfigure(old,new)
response (OK / ConfigException)
Dynamic-config update in KRaft. The controller persists a ConfigRecord; brokers validate the candidate config first (atomic abort on failure) and only then apply it. No process restart, no ZooKeeper.
admin tool · controller · metadata log · broker · request/apply · async replicate / reply

The reconfigurable subsystems and the configs they own:

Subsystem (source)Dynamic configs it ownsScope
DynamicLogConfig (DynamicBrokerConfig.java:203)Every log.* synonym of a topic config, retention, segment, cleanup policy, flush, compression, min.insync.replicas, timestamp bounds, plus cordoned.log.dirs.cluster-wide (cordon is per-broker)
DynamicThreadPool (server/.../DynamicThreadPool.java:29)num.io.threads, num.replica.fetchers, num.recovery.threads.per.data.dir, background.threads.cluster-wide
DynamicListenerConfig + SocketServer (DynamicBrokerConfig.java:216)SSL/SASL keystore + truststore + mechanisms, listeners, num.network.threads, max.connections, max.connection.creation.rate.per-broker (some listener limits cluster-wide)
DynamicRemoteLogConfig (DynamicBrokerConfig.java:269)Tiered-storage thread pools, copy/fetch byte-rate caps, remote index cache size, fetch max-wait.cluster-wide
DynamicQuotaConfig / LogCleaner / metrics reportersBroker quota knobs (QuotaConfig.BROKER_QUOTA_CONFIGS), log-cleaner threads/IO, metric.reporters.cluster-wide
Why these are dynamic and others are not

A config is dynamic iff the subsystem that consumes it can swap to a new value without reconstructing immutable state. Thread-pool sizes are dynamic because the pools resize in place. All topic/log configs are dynamic because each partition rereads its LogConfig. SSL stores are dynamic because the channel builder can rebuild an SSL context. But broker.id, process.roles, log.dirs (the directory set), controller.quorum.voters, and inter.broker.listener.name are not dynamic, they wire identity, the storage layout, and the consensus quorum that must be fixed before the broker can participate at all. Trying to alter one of these dynamically is rejected by nonDynamicConfigs (DynamicBrokerConfig.java:178).

Static and dynamic of the same key coexist, and dynamic wins

You can have num.io.threads=8 in server.properties and a dynamic per-broker num.io.threads=16. The dynamic value wins (it is layered on top), and it survives restarts because it lives in the metadata log, not the file. After an incident where you bumped a thread pool live, reconcile server.properties with the dynamic value or a future operator reading only the file will be misled. Use kafka-configs.sh --describe --all to see the effective value and its source synonyms.

The ~20 knobs that actually matter

Most of Kafka's hundreds of configs are leave-them-alone. The following are the ones an operator genuinely tunes. Every default below is read from the source cited; every "tunes" column cross-links the Part I chapter that explains the mechanism. D? = dynamically alterable.

Durability & replication

ConfigDefaultD?What it does & the mechanism it tunes
default.replication.factor (broker) / topic RF1 ReplicationConfigs.java:42create-timeNumber of copies of each partition. RF is fixed at create time per topic (the broker key only seeds auto-created and internal topics). Set RF=3 in production so the cluster survives one broker loss with a quorum remaining. Mechanism: 08 · Replication (ISR).
min.insync.replicas1 ServerLogConfigs.java:155yes (cluster/topic)With acks=all, the minimum ISR members that must persist a write or the producer gets NotEnoughReplicas. Set to 2 with RF=3: a write survives one in-sync replica failing after the ack. The producer-visibility rule also lives here, records are invisible to consumers until the ISR meets this. See 08 · Replication, II · 06 · Durability.
unclean.leader.election.enablefalse LogConfig.java:139yes (cluster/topic)When false, a partition with no in-sync replica stays offline rather than promoting a stale out-of-ISR replica that would truncate committed data. Keep false, flipping it trades availability for guaranteed data loss. In KRaft, enabling it dynamically waits up to unclean.leader.election.interval.ms (5 min default, ReplicationConfigs.java:123) unless you force it with kafka-leader-election.sh. Mechanism: 08 · Replication, 11 · KRaft Controller.
replica.lag.time.max.ms30000 ReplicationConfigs.java:55noA follower that hasn't fetched up to the leader's LEO within this window is dropped from ISR. Lower → ISR shrinks faster (more sensitive to slow followers); higher → tolerates lag but weakens the durability promise. Keep replica.fetch.wait.max.ms (500) well below it. See 08 · Replication.
num.replica.fetchers1 ReplicationConfigs.java:96yes (cluster)Fetcher threads per source broker pulling replica data. Total fetcher threads = this × #brokers. Raise to 4–8 when followers lag a high-throughput leader; it parallelises follower I/O. Mechanism: 08 · Replication, 09 · Fetch Path.
The durability triad, and its mechanism

replication.factor=3 + min.insync.replicas=2 + producer acks=all + unclean.leader.election.enable=false is the production baseline. Why each term: RF=3 gives three copies; acks=all blocks the ack until every current ISR member has the record (so a single replica is never the sole holder of an acked write); min.insync.replicas=2 refuses writes when the ISR has already collapsed to one (no false durability); unclean=false forbids promoting a replica that never had the data. With all four, a single failure is non-data-losing, loss requires ≥2 simultaneous failures. This is the durability triad cited across the operations literature (Conduktor; Datadog's pre-0.11 unclean-election data-loss post-mortem). See II · 06 · Durability.

The silent min.insync.replicas trap

min.insync.replicas is capped by the actual replica count. The effective value is minInSyncReplicas.min(remoteReplicasMap.size + 1) in core/src/main/scala/kafka/cluster/Partition.scala:246–247. So on an RF=1 topic, min.insync.replicas=2 silently behaves as 1 and writes are accepted with zero redundancy. This is an EMERGENT limit, not a SOFT one, the config does not error, the topology overrides it. Always verify min.insync.replicas ≤ replication.factor per topic; the gap is invisible until the day you lose the only replica.

Storage, retention & cleanup

Config (topic ↔ broker)DefaultD?What it does & the mechanism it tunes
retention.mslog.retention.ms7 days (604800000) LogConfig.java:134yesTime a record is kept under the delete policy. -1 = keep forever. The broker key is null and falls back to log.retention.minuteslog.retention.hours (168). This is your primary disk-sizing dial and an SLA on how long consumers may lag. See 04 · Storage Management, II · 04 · Capacity Planning.
retention.byteslog.retention.bytes-1 (no cap) ServerLogConfigs.java:81yesPer-partition size cap; multiply by partition count for the topic total. Combined with retention.ms, whichever triggers first wins. Use it as a hard ceiling so a traffic spike cannot fill the disk. See 04 · Storage Management.
segment.byteslog.segment.bytes1 GiB (1073741824) LogConfig.java:131yesMax size of one log segment file (floor 1 MiB, atLeast(1024*1024)). Retention and cleaning operate a whole segment at a time, so smaller segments = finer retention granularity but more files (FDs, mmaps). Lower it on low-volume topics where 1 GiB would never roll within the retention window. See 03 · Storage / Log Engine.
segment.mslog.roll.ms/hours7 days LogConfig.java:132yesForces a roll even if segment.bytes isn't reached, so retention can act on time-bounded data. The active segment is never deleted; if it never rolls, nothing is reclaimed. See 03 · Storage / Log Engine.
cleanup.policylog.cleanup.policydelete ServerLogConfigs.java:89yesdelete drops old segments by time/size; compact keeps the latest value per key (log compaction); delete,compact does both; empty list = infinite retention. Compaction is what makes changelog/__consumer_offsets-style topics bounded. See 03 · Storage / Log Engine, 13 · Group Coordination.
min.cleanable.dirty.ratio0.5 LogConfig.java:138yesFor compacted topics, the dirty fraction that makes a log eligible for cleaning. Lower → cleans more often (less wasted space, more CPU); higher → fewer, larger cleanings. Bounds worst-case duplicate space. See 03 · Storage / Log Engine.
num.recovery.threads.per.data.dir2 ServerLogConfigs.java:147yesThreads per data dir for log recovery at startup / flush at shutdown. Raise it to shorten an unclean-restart recovery on a broker with many partitions, this directly cuts time-to-rejoin after a crash. See 04 · Storage Management, II · 07 · Failure Modes.
Defaults are not magic numbers, segment vs retention interact

A common foot-gun: a low-traffic topic with retention.ms=1h but the default segment.bytes=1GiB and segment.ms=7d keeps data for up to 7 days, because retention only deletes closed segments and the active one won't roll for a week. If a topic must honour a tight retention SLA, lower segment.ms to roughly the retention window. The active segment is never eligible for deletion, see 04 · Storage Management.

Message size & timestamps

ConfigDefaultD?What it does & the mechanism it tunes
max.message.bytesmessage.max.bytes1 MiB + 12 = 1048588 ServerLogConfigs.java:177yesLargest record batch (after compression) the broker accepts; the + LOG_OVERHEAD (12 B) is the batch header. Raising it must be matched on producers (max.request.size) and on the replication path, replica.fetch.max.bytes (1 MiB, ReplicationConfigs.java:68) is "not absolute" and will still return one oversized batch, but undersizing it wastes fetch round-trips. See 01 · Record Format, II · 02 · Limits.
message.timestamp.typeCreateTime ServerLogConfigs.java:127yesCreateTime trusts the producer's timestamp; LogAppendTime stamps at the broker. Time-based retention and time-index lookups use this value, so a skewed producer clock under CreateTime can retain data wrongly. See 01 · Record Format.
message.timestamp.before.max.msLong.MAX_VALUE (off) ServerLogConfigs.java:132yesUnder CreateTime, rejects a record whose timestamp is more than this far in the past. Default off; set it to stop ancient/backfilled records from poisoning time-based retention.
message.timestamp.after.max.ms3600000 (1 h) ServerLogConfigs.java:139yesUnder CreateTime, rejects a record timestamped more than 1 hour in the future, a guard against far-future timestamps that would pin retention open forever. Note the asymmetry with the "before" default.

Throughput, threads & sockets

ConfigDefaultD?What it does & the mechanism it tunes
num.io.threads8 ServerConfigs.java:51yesRequest-handler threads that do the work (incl. disk I/O) after a network thread parses a request. The classic rule of thumb is ~8 per data disk (Confluent/Strimzi, empirical). Watch RequestHandlerAvgIdlePercent: keep >30%. See 07 · Request Processing, 06 · Network & Threading.
num.network.threads3 SocketServerConfigs.java:152yesAcceptor-side threads that read requests / write responses per listener. Raise to 8–12 on high-connection or high-fan-out brokers. Watch NetworkProcessorAvgIdlePercent (<0.30 = saturated). See 06 · Network & Threading.
queued.max.requests500 SocketServerConfigs.java:144noDepth of the queue between network and I/O threads; back-pressure point when I/O threads can't keep up. See 07 · Request Processing.
socket.request.max.bytes100 MiB (104857600) SocketServerConfigs.java:96noHard ceiling on a single request's size, a memory-safety bound. Must exceed your largest produce/fetch. See II · 02 · Limits.
socket.send.buffer.bytes / .receive.buffer.bytes100 KiB each SocketServerConfigs.java:88,92noSO_SNDBUF/RCVBUF. On high bandwidth-delay-product links raise toward 1 MiB so a single connection can fill the pipe. See 06 · Network & Threading.
compression.typelog.compression.typeproducer (broker) ServerLogConfigs.java:178yesBroker default producer means "store the codec the producer chose", no recompression. Set a concrete codec (lz4/zstd) to normalise across producers; uncompressed forces none. Producer-side compression.type defaults to none (ProducerConfig.java:409), the real choice happens on the producer. See 16 · Producer Client, II · 05 · Performance Tuning.
num.partitions1 ServerLogConfigs.java:36n/a (create-time)Default partition count for auto-created and -1-requested topics. Partition count is the unit of parallelism and is practically immutable for keyed topics. Set it deliberately per topic; do not lean on this default. See II · 03 · Partitioning, 13 · Group Coordination.

The flush configs, and why you almost never set them

Operators new to Kafka reach for log.flush.interval.messages / log.flush.interval.ms expecting they control durability. They do not, by default, and that is deliberate.

ConfigDefaultD?Meaning
flush.messageslog.flush.interval.messagesLong.MAX_VALUE ServerLogConfigs.java:101yesfsync after N messages. MAX_VALUE = never force fsync by count.
flush.mslog.flush.interval.msnull → falls back to log.flush.scheduler.interval.ms = Long.MAX_VALUE ServerLogConfigs.java:109yesfsync after a time interval. Effectively disabled by default.
Why Kafka leaves fsync to the OS, and what acks=all really promises

By default Kafka does not fsync on the write path; it writes into the page cache and lets the OS flush lazily, which is why it sustains hundreds of MB/s on commodity disks (the zero-copy / page-cache architecture, Part I 03/09). Durability comes from replication, not from disk-forcing each write, the source doc for flush.messages says exactly this (TopicConfig.java:54: "use replication for durability ... it is more efficient"). The crucial corollary: acks=all means "in the page cache of all ISR members," not "fsynced to all disks." A correlated power-loss across all replicas can still lose acked-but-unflushed data. Forcing per-message fsync (setting flush.messages=1) restores disk-level durability but degrades tail latency under load, the central point in the Redpanda/AutoMQ benchmark critiques. If you need disk-fsync-per-write semantics, set it knowingly and budget the latency; otherwise lean on RF + ISR. See II · 06 · Durability.

Quotas, the throttle dials

Quotas are dynamic, client-entity-scoped configs stored like any other dynamic config. The four override keys (server-common/.../config/QuotaConfig.java:89–92):

producer_byte_rate
Bytes/sec a (user, client-id) may produce; default unlimited (Long.MAX_VALUE). Over-quota requests are delayed, not failed.
consumer_byte_rate
Bytes/sec a client may fetch; same delay mechanism.
request_percentage
Cap on the fraction of broker request-handler + network time a client may consume, protects threads from a single noisy client.
controller_mutation_rate
Rate of topic create/delete/partition-add mutations a client may issue, guards the controller from metadata storms.

The quota window is shaped by quota.window.num (11 samples) × quota.window.size.seconds (1 s) (QuotaConfig.java:42,52). Mechanism and the throttling math: 19 · Quotas; operational use: II · 10 · Cost and II · 11 · Scaling Scenarios.

Client-side configs that change broker behaviour

Three producer/consumer defaults are worth pinning because operators misquote them and because they interact with broker-side guarantees. All are read from the client ConfigDefs.

Config (side)DefaultWhy it matters operationally
acks (producer)all ProducerConfig.java:405Default is now all (not 1). Combined with idempotence-on-by-default this gives safe-by-default delivery, at the cost of latency = slowest ISR member. 0/1 trade durability for latency. See 16 · Producer Client, II · 06 · Durability.
enable.idempotence (producer)true ProducerConfig.java:543On by default; requires acks=all and max.in.flight ≤ 5 (the HARD constant). Prevents duplicates on retry and preserves per-partition order even with 5 in flight. See 16 · Producer Client, 14 · Transactions / EOS.
linger.ms / batch.size (producer)5 / 16384 ProducerConfig.java:418,413Batching dials. linger.ms is 5 (a common misquote is 0). Larger batches amortise per-request overhead and improve compression ratio; the cost is up to linger.ms of added latency. Only raise linger when produce rate fills batch.size within the window. See II · 05 · Performance Tuning.
fetch.min.bytes (consumer)1 ConsumerConfig.java:187Tuned for latency, not cost, Kafka does not batch reads by default. New Relic cut whole-cluster broker CPU 15% by raising this on consumers of low-throughput topics (empirical, New Relic). See 17 · Consumer Client, II · 05 · Performance Tuning.
transaction.timeout.ms (producer) vs transaction.max.timeout.ms (broker)60 s vs 15 min TransactionStateManagerConfig.java:33The producer value is capped by the broker max. A misconfigured transactional producer can pin the LSO and freeze read_committed consumers on a partition for up to 15 minutes. See 14 · Transactions / EOS, II · 07 · Failure Modes.

Empirical tuning baselines (mark these as benchmarks, not guarantees)

The source gives you defaults and mechanisms; it cannot give you "good" values for your hardware. These are the most-cited empirical starting points from the reference, directional, version- and workload-dependent, never SLAs:

  • JVM heap ~6 GB, rest to page cache. Kafka does not benefit from a large heap; on a 64 GB box a 6 GB heap leaves ~28–30 GB for the page cache that actually serves reads/writes (Confluent). Larger heaps risk multi-100ms G1 pauses that drop ISRs (Conduktor).
  • File descriptors ≥ 100,000 per broker; production brokers commonly hold >30k open handles. FDs ≈ (partitions × partition-size ÷ segment.bytes) + connections (Jun Rao / Confluent). Lowering segment.bytes raises FD and mmap pressure.
  • vm.max_map_count ≥ 262144; the Linux default (~65,530) caps a KRaft broker near ~32k partitions because each partition uses ~2 mmap areas (Instaclustr, empirical, version-dependent).
  • vm.swappiness=1, not 0; 0 forbids swap and removes the OOM safety net (Cloudera/Confluent).
  • num.io.threads ≈ 8 × data disks, bounded by cores and disk bandwidth; num.network.threads 8–12 on busy brokers; num.replica.fetchers 4–8 when followers lag (Strimzi/Confluent).
  • Throughput recipe (Intel/Confluent, illustrative): batch.size 32 KB–1 MB, linger.ms 5–100, acks=all, min.insync.replicas=2, compression.type=lz4 (or zstd for storage density). Reference baseline: ~605 MB/s peak; p99 5 ms at 200 MB/s on i3en, fsync off (Confluent OpenMessaging), a specific test, not a promise.
Version- and vendor-dependent numbers, handle with care

Treat every benchmark above as directional. The 605 MB/s / "p99 5 ms at 200 MB/s" figures are one i3en test with fsync off and reads served from cache. "2 million writes/sec" (Kreps/LinkedIn, 2014) was three producers with async replication; a single producer with acks=-1 dropped to ~422K rec/s, sync replication roughly halved throughput. Redpanda's "10×" claims used a crippled Kafka (per-batch fsync, Java 11); on equal hardware Kafka matched or beat it (Vanlightly). And the per-broker partition ceiling is an EMERGENT bound of memory + FDs + vm.max_map_count + fetcher overhead, not a config, see II · 02 · Limits and II · 03 · Partitioning.

Operating the configuration surface

A short decision tree for "I need to change a config in production":

need to change a config
in ALL_DYNAMIC_CONFIGS?
per-broker or cluster?
kafka-configs.sh --entity-type brokers --entity-default --alter
--entity-name <id> (survives restart, shadows default)
verify: --describe --all shows new effective value
edit server.propertiesrolling restart
wait URP→0 between brokers (08)
Change-a-config runbook. Dynamic path = no downtime, atomic validate-then-apply, persisted in the metadata log. Non-dynamic path = rolling restart, gated on under-replicated partitions returning to zero between brokers.
operator intent · classify/route · apply via metadata · verify/wait · restart-required path · step · async
Operator rules of thumb for configuration

(1) Read the default in source before you override it, the .define() argument is truth. (2) Prefer topic-level overrides to broker-wide changes; they are surgical and dynamic. (3) After any live change, reconcile server.properties so the file and the metadata log agree. (4) Treat RF and partition count as create-time, near-immutable for keyed topics, get them right up front, lean on kafka-reassign-partitions only for placement, not count. (5) A dynamic update either fully applies or fully aborts (processReconfiguration), so a rejected value never leaves you half-configured, but it will surface as a ConfigException you must read. (6) When a cluster-wide change doesn't land on one broker, look for a stale per-broker override shadowing it.

From here, II · 02 · Limits takes the HARD/SOFT/EMERGENT framing into the concrete ceilings (partition counts, request sizes, connection limits); II · 03 · Partitioning handles the one config you can't easily undo; and II · 06 · Durability turns the durability triad above into a full guarantee model. Everything in this manual ultimately routes back through this tuning surface.

krivaltsevich.com · Part of Apache Kafka Internals · derived from Apache Kafka 4.4 source · GitHub · MIT-licensed.

Apache Kafka® is a registered trademark of the Apache Software Foundation. This is an independent, unofficial guide, not affiliated with or endorsed by the ASF.