II · 10 · Cost Engineering

Source: Apache Kafka 4.4.0-SNAPSHOT (git 04bfe7d, 2026-06-15), KRaft mode. Operational guidance grounded in source code and cited benchmarks.

In the cloud, a Kafka bill is rarely what an engineer expects. The instinct is "more brokers = more money," and compute is real, but on a well-utilised cluster it is usually the smallest of the three line items. The two giants are storage (ingress × retention × replication factor, sitting on local disk that the RF multiplies) and, almost always the largest, cross-AZ network transfer, because Kafka's replication protocol copies every byte RF−1 times between brokers, and in a multi-AZ deployment those copies, plus the producer write to a leader in another zone and every consumer fetch from a leader in another zone, all cross zone boundaries that the cloud provider meters per gigabyte in each direction. Confluent's own teardown of a 100 MBps cluster splits it roughly $2.3k compute / $14.5k storage / $24.2k networking per month, networking "likely over 50%," rising to ~90% once tiered storage shrinks storage (Confluent, empirical). This chapter roots every cost driver in a concrete mechanism you have already met in Part I, gives each lever its dial and its tradeoff, and ends with a worked cost model and an effort/impact-ranked lever table. The cardinal rule here is unusually literal: each dollar has a why, and the why is a byte that some piece of source code decided to copy, store, or compress.

The three cost drivers, each rooted in a mechanism

Kafka's architecture fixes the shape of the bill before you tune anything. Three mechanisms, log persistence, ISR replication, and broker-owned compute, each generate one of the three line items. Internalise the mechanism and the cost becomes predictable arithmetic rather than a surprise.

NETWORK, usually the largest in the cloud

cross-AZ replication (ingress × (RF−1)) + cross-AZ produce (~⅔ of writes) + cross-AZ consumer fetch (~⅔ × #groups). Metered per GB, each direction. The replication term dominates at RF=3.

$0.01/GB ea. way (AWS)RF=3 ⇒ 2× ingress copiedISR, Part I 08

STORAGE, local disk, multiplied by RF

ingress × retention × RF on broker-attached block storage (EBS/SSD). Every produced byte is written to disk and the log keeps it for the retention window; RF copies it on RF brokers.

EBS ~$0.08/GB-molog engine, Part I 03tiered, Part I 05

COMPUTE, brokers (CPU, RAM, NIC)

CPU for compression/decompression, TLS handshakes + record encryption, request handling, replication fetch. RAM split heap (~6 GB) vs page cache. Usually the smallest line item on a well-utilised cluster.

heap ~6 GBpage cache = read acceleratornet/threads, Part I 06

The bill decomposes by mechanism. The order shown, network ≥ storage ≥ compute, is the typical cloud ordering for a multi-AZ RF=3 cluster at meaningful throughput.

network / cross-AZ transfer storage / local disk compute / brokers chip = a key number or the Part I chapter that explains the mechanism

Driver 1, Storage: ingress × retention × RF

Every byte a producer sends is appended to the leader's log and, because Kafka persists to disk by design (Part I Storage & the Log Engine), written to broker-attached block storage. The log retains it for the configured window, and the replication factor copies the whole log onto RF brokers. The storage bill is therefore the cleanest formula in the chapter:

Storage (GB on disk): ingress (MB/s) × 86,400 (s/day) × retention (days) × 0.001 (GB/MB) × RF, the two literals are pure unit conversions: 86,400 s/day = 60×60×24 (turns a per-second rate into bytes/day), and 0.001 GB/MB converts MB→GB. Units cancel to GB: (MB/s)·(s/day)·(days)·(GB/MB) = GB.
Storage cost/mo: (GB on disk) × $/GB-month, at the assumed $0.08/GB-mo EBS gp3 rate (Confluent, empirical; an illustrative cloud price, check your bill).

The three multipliers are all operator-controlled, and each maps to a config with a source-verified default:

Multiplier	Config (default)	Source	Cost effect
Retention (time)	`retention.ms` = 604800000 (7 days)	`storage/.../LogConfig.java:134,212` (`DEFAULT_RETENTION_MS = 24760601000`)	Linear: halve retention → halve storage.
Retention (size)	`retention.bytes` = −1 (unbounded)	`server-common/.../ServerLogConfigs.java:81` (`LOG_RETENTION_BYTES_DEFAULT = -1`)	A hard per-partition cap; the binding limit when both are set.
Replication factor	`default.replication.factor` = 1	`server/.../ReplicationConfigs.java:42,153` (`REPLICATION_FACTOR_DEFAULT = 1`)	Direct multiplier: RF=3 stores 3× the bytes of RF=1.

The default RF is 1, and that is a cost trap in disguise

default.replication.factor=1 is the source default, but it is not a production setting: RF=1 means a single broker loss permanently loses data, and (see Part I Replication & the ISR) it silently breaks min.insync.replicas=2 because the effective in-sync requirement is capped at the replica count. Every durable deployment overrides RF to 3, which triples both the storage and the cross-AZ replication bill. The cost of durability is not a footnote; it is a factor-of-3 on your two largest line items, and it is why RF is the first dial to revisit on non-critical topics (RF=3→2 cuts both by ⅓). Set durability deliberately in Durability Engineering, do not let it be an accident of the default.

Driver 2, Network: replication amplification + cross-AZ transfer

This is the one that surprises people, and it is structural. Two distinct things make up the network bill, and both come straight from how Kafka moves bytes.

(a) Replication amplification. When a producer writes 1 GB to a partition leader, the ISR mechanism (Part I Replication & the ISR) copies that GB to each of the RF−1 followers via their replica-fetcher threads. So RF−1 GB of inter-broker traffic is generated per GB ingested, at RF=3, 2 GB of replication per 1 GB written. This shows up on the brokers as kafka.server:type=BrokerTopicMetrics,name=ReplicationBytesInPerSec / ReplicationBytesOutPerSec (storage/.../BrokerTopicMetrics.java:37–38), watch those to see the amplification directly.

(b) Cross-AZ billing. Inside a single AZ, inter-broker traffic is free. The cloud meters traffic that crosses AZ boundaries, and Kafka's placement guarantees a lot of it crosses:

Clients (producers and consumers) talk only to the partition leader. With brokers spread across 3 AZs, a leader is in a different AZ from the client ~⅔ of the time. So ~⅔ of produce bytes and ~⅔ of consumer-fetch bytes cross a zone boundary.
All replication crosses zones when RF spans AZs (the whole point of multi-AZ RF is that the replicas are in different failure domains). That is the ingress × (RF−1) term, and it dominates.

Cross-AZ throughput (3 AZ): (ingress × ⅔) + (egress × ⅔) + (ingress × (RF−1)), Confluent / AutoMQ formula (empirical). The ⅔ is a probability, not a tunable: with brokers evenly spread over 3 AZs, a randomly placed client shares the leader's AZ 1 of 3 times, so 1 − ⅓ = ⅔ of produce and of fetch bytes cross a zone boundary. The (RF−1) replication term carries no ⅔ because every follower copy is placed in a different AZ on purpose (that is the point of multi-AZ RF).
Effective AWS rate: $0.01/GB in EACH direction → ~$0.02/GB effective (AutoMQ, 2minutestreaming, empirical; illustrative, verify against your cloud bill). GCP ≈ $0.01/GB once; Azure inter-AZ historically free.

Producer (AZ-a) Leader (AZ-b) Follower (AZ-c) Consumer (AZ-a)

produce 1 GB, crosses AZ ($)

replicate 1 GB, crosses AZ ($) [×(RF−1)]

ack (ISR)

consumer fetch 1 GB from leader, crosses AZ ($)

One 1 GB write, naive placement, RF=3, one consumer group: ~1 GB produce (cross-AZ ~⅔ of the time) + 2 GB replication (cross-AZ) + ~1 GB fetch (cross-AZ ~⅔). The replication term is the structural floor.

client broker data transfer (billed if it crosses AZ) ack / control ($) = a metered cross-AZ hop

Why network beats compute and storage

Compute scales with broker count, which scales with peak throughput, and modern instances are cheap per MB/s. Storage scales with bytes at rest. But cross-AZ network scales with bytes in motion × an amplification factor that, for one consumer group, is roughly (RF−1) + ⅔ + ⅔ = (RF−1) + 4/3, the RF−1 replication copies, plus ⅔ of produce and ⅔ of fetch crossing zones (the 4/3 is just those two ⅔ terms added), and every gigabyte is charged twice (each direction) at a non-trivial rate. AutoMQ's worked example: a 3-node, 100 MiB/s, 3-consumer-group cluster moves ~173 TB/mo of producer writes (~$3,460) but ~$10,360 of replication and ~$10,360 of consumer reads, landing at $14k–$24k/mo in cross-AZ alone, vs a VM bill an order of magnitude smaller (AutoMQ, empirical). Even with fetch-from-follower eliminating the consumer term, a produce+replication floor of ~$13.8k/mo remains. This is why the cost levers below are ordered with the networking levers high.

Driver 3, Compute: brokers

Compute is CPU, RAM, and NIC on the broker fleet. The CPU-heavy paths are compression/decompression (Part I The Record & Batch Format), TLS handshakes and per-record encryption (Part I Security), request handling across the network/IO thread pools (Part I Network & Threading), and replication fetch. RAM is split: keep the JVM heap small, ~6 GB, and leave the rest of the box to the OS page cache, which is the real read/write accelerator via zero-copy sendfile (Confluent, empirical; the page-cache fetch path is Part I The Fetch Path). On a well-utilised, well-batched cluster, compute is usually the smallest of the three line items; it becomes material only when TLS + high message rates + heavy compression saturate cores, or when over-provisioned brokers sit idle. Right-sizing compute is the subject of Capacity Planning; here we note only that throwing brokers at a throughput problem is cheap, while throwing them at a network-cost problem does nothing, cross-AZ is per-GB, not per-broker.

The levers, each with its mechanism, dial, and tradeoff

Five levers, in roughly increasing order of effort and architectural disruption. The first is nearly free; the last restructures the cost model. Each is grounded in a specific Kafka feature you can point to in source.

Lever 1, Compression: `compression.type` (cuts storage AND network at CPU cost)

Compression is the highest-leverage, lowest-effort move, because Kafka compresses per record batch on the producer and then stores and replicates the batch still compressed, the broker does not recompress on the happy path, and zero-copy fetch ships the compressed bytes to consumers untouched. So a single producer config simultaneously shrinks storage, replication bytes, cross-AZ transfer, and consumer-fetch bytes. The cost is CPU on the producer (and on any consumer that decompresses).

The source enumerates exactly five codecs and their level ranges in clients/.../record/internal/CompressionType.java:

Codec (`id`)	Level range / default	Source line	Ratio (HTTP data)	Speed
`none` (0)	,	`CompressionType.java:30`	1.0×	n/a
`gzip` (1)	1–9, default `-1` (Deflater.DEFAULT)	`CompressionType.java:33–36`	3.58×	slowest, usually avoided
`snappy` (2)	, (no levels)	`CompressionType.java:71`	2.35×	fast
`lz4` (3)	1–17, default 9	`CompressionType.java:72,75–77`	1.81×	fastest decompress (2,428 MB/s)
`zstd` (4)	−131072–22, default 3	`CompressionType.java:99,105–109`	4.5× (lvl 6)	moderate (409/844 MB/s)

Ratios and speeds above are Cloudflare's production measurements on ~1 MB / 600-record HTTP-request batches (empirical), they are data-dependent, not guarantees: text/JSON compresses ~10–12×, pre-compressed or binary payloads barely at all. The operational defaults that fall out:

compression.type=lz4 when CPU/latency matter most, fastest codec, ~1.8–2× shrink, smallest CPU tax. The throughput recipe's default.
compression.type=zstd when bytes (and therefore cross-AZ + storage cost) matter most, markedly better ratio for moderately more CPU. Cloudflare chose zstd and saved "hundreds of gigabits of internal traffic and terabytes of flash storage," cancelling a hardware expansion (empirical); Trendyol reported ~70% message-size reduction at zstd level 3 (empirical). Tune the level via compression.zstd.level (added with fine-grained level control, KIP-390/KIP-780).

Compression is per batch, tiny batches barely compress

Because the unit of compression is the record batch, a producer that flushes one record at a time gets almost no ratio regardless of codec. Compression therefore composes with batching: raise batch.size (default 16384) and set a non-zero linger.ms so batches fill before they ship, only then does "zstd gives 4.5×" materialise. See Performance Tuning for the batching dials. Also: the producer default is compression.type=none (ProducerConfig.java:245,409; the doc literally says "The default is none"), and the broker default is compression.type=producer, meaning the broker keeps whatever the producer sent (server-common/.../ServerLogConfigs.java:178 → BrokerCompressionType.PRODUCER, BrokerCompressionType.java:33). So if no one sets it, nothing is compressed. "zstd is the default" is a common and costly myth.

Lever 2, Fetch-from-follower: rack-aware replica selection KIP-392 (kills consumer cross-AZ)

By default, a consumer always fetches from the partition leader, which is in another AZ ~⅔ of the time, generating the cross-AZ consumer-read term. KIP-392 lets a consumer instead read from a same-AZ follower replica, eliminating that term entirely. It is a configuration change, not an architectural one, and on consumer-read-heavy clusters it is the single biggest networking win after compression.

Three configs turn it on; all are source-verified:

Side	Config	Value	Source
Broker	`broker.rack`	this broker's AZ id (e.g. `us-east-1a`)	`server-common/.../ServerConfigs.java:92`
Broker	`replica.selector.class`	`org.apache.kafka.common.replica.RackAwareReplicaSelector`	`server/.../ReplicationConfigs.java:140,173` (default `null` = "returns the leader")
Consumer	`client.rack`	the consumer's AZ id (must match a `broker.rack`)	`clients/.../CommonClientConfigs.java:77–79` (default `""`)

The mechanism is worth seeing in source, because its constraints are the tradeoff. The broker resolves the preferred replica in ReplicaManager.findPreferredReadReplica (core/.../ReplicaManager.scala:1964–2013), which builds the candidate set and hands it to the selector. Two guards in that code define the behaviour:

A follower is a candidate only if it is in the ISR and its logEndOffset ≥ fetchOffset ≥ logStartOffset (ReplicaManager.scala:1985–1987). The comment is explicit: excluding out-of-sync replicas prevents the leader from "continuously pick[ing] the lagging follower … indefinitely."
RackAwareReplicaSelector then filters to replicas whose endpoint().rack() equals the client's rackId; if the leader is in-rack it returns the leader, otherwise the most caught-up in-rack replica; if none are in-rack it falls back to the leader (clients/.../replica/RackAwareReplicaSelector.java:35–52).

The latency tradeoff is the high-watermark, and it is real

A follower can only serve records up to the high-watermark (the offset replicated to the full ISR; see Part I The Fetch Path and Replication & the ISR), it cannot hand out data the leader has but the ISR has not yet confirmed. Consumers that switch to a follower therefore see data slightly later than the leader's log end. Grab measured "up to 500 ms" of added consumer latency from this, and observed broker load skew because fetch traffic now follows replica placement rather than leader placement (Grab/InfoQ, empirical). What it does not touch: produce and replication bytes still cross AZ. Grab's result, reconfigured-consumer cross-AZ cost driven to zero, at +500 ms, is the canonical outcome (empirical).

Lever 3, RF and retention as cost dials

Before reaching for new architecture, the two oldest dials give linear savings with no latency penalty, only a durability/availability tradeoff:

Replication factor. RF multiplies both storage and cross-AZ replication. RF=3 is the durable standard (tolerates one broker loss with min.insync.replicas=2, two losses before unavailability). RF=2 cuts both costs by ⅓ but leaves zero headroom: a single failure drops you to one replica, and min.insync.replicas=2 then blocks all writes to that partition (Part I Replication & the ISR). Reserve RF=2 for explicitly non-critical, reproducible topics. Set via default.replication.factor (default 1) or per-topic.
Retention. retention.ms (default 7 days, LogConfig.java:134) cuts storage linearly, 3-day retention is half the disk of 6-day. Use retention.bytes (default −1) to cap per-partition size as a hard backstop. Shorter retention is free until someone needs to replay further back than the window allows, which is exactly the gap that Lever 4 fills.

Lever 4, Tiered storage KIP-405 (cheap object store for cold data)

Tiered storage (Part I Tiered Storage) lets a topic keep only a small local tail on broker disk while offloading older segments to object storage (S3/GCS/ABS) via the RemoteLogManager (storage/.../RemoteLogManager.java). Because the RF-multiplied copy lives only on the cheap, single-copy object store, this attacks the storage line item hard, object storage runs ~$0.02/GiB-mo vs ~$0.08–0.10/GiB-mo for EBS, ~4–5× cheaper, and decouples retention from broker disk (so you can keep weeks or months without growing the fleet). Enable it cluster-wide with remote.log.storage.system.enable=true (default false, RemoteLogManagerConfig.java:55,58) and per-topic with remote.storage.enable=true (default false, LogConfig.java:142,253); set the local tail with local.retention.ms / local.retention.bytes (default −2 = "derive from retention.ms/retention.bytes", LogConfig.java:145,146,255,257).

Tiered storage does NOT reduce cross-AZ networking

This is the most common and most expensive misconception about KIP-405. It moves bytes at rest to a cheaper tier; it does nothing to bytes in motion. Replication still copies RF−1 times across AZs, producers and consumers still cross zones. In fact, by shrinking storage it raises networking's share of the bill, Aiven's figure: with tiered storage, "Networking is 83%+ of cost ($882k/yr out of $1.05M/yr)" (empirical). If networking is your problem, the levers are 2 (fetch-from-follower) and 5 (diskless), not 4. Watch the offload working via kafka.server:type=BrokerTopicMetrics,name=RemoteCopyBytesPerSec / RemoteFetchBytesPerSec and the backlog gauge RemoteCopyLagBytes (storage/api/.../RemoteStorageMetrics.java:35–36,48); remote reads are served by a pool of remote.log.reader.threads (default 10, RemoteLogManagerConfig.java:150,152).

Lever 5, Diskless / object-store-native designs KIP-1150 (restructure the model)

The newest direction attacks the networking floor directly. Diskless designs (WarpStream and AutoMQ commercially; KIP-1150 "Diskless Topics" in Apache Kafka) write produce batches directly to object storage instead of replicating between brokers. With no inter-broker replication and no local-disk RF, the ingress × (RF−1) cross-AZ term, the dominant one, drops to ~0, and durability comes from the object store's own cross-AZ replication (which the provider does not bill back to you as inter-AZ transfer). KIP-1150 was accepted ~March 2026, but acceptance is not a production-ready OSS implementation, treat it as a roadmap item, not a deployable feature, in mid-2026.

Diskless trades latency for cost, it is not a drop-in

Writing to S3 means waiting on a commit interval (~250 ms or an 8 MiB batch) plus an S3 PUT (~200–400 ms p99 for 2–8 MB), so produce latency moves from sub-100 ms to ~200–400 ms typical, up to ~2.4 s end-to-end (WarpStream/Aiven, empirical). S3 Express One Zone narrows produce p99 to ~169 ms. The cost case is dramatic where it fits: WarpStream's own TCO benchmark put 3-AZ OSS Kafka at $20,252/mo (inter-zone networking alone $14,765) vs WarpStream at $2,961/mo for the same 268 MiB/s workload (vendor-reported, directional). KIP-1150 is explicitly designed to coexist with classic sub-100 ms topics in one cluster, not replace them, use diskless for high-throughput, latency-tolerant streams and classic topics for the rest. Vendor % savings (80–90%) assume high fanout, retail pricing, and RF=3; the mechanism is real, the percentage is workload-dependent.

A worked cost model

Put it together on one concrete workload so the arithmetic, and the dominance of networking, is unmistakable. Every number below is introduced first as a labeled assumption, then used; the rates are the empirical reference's cited cloud figures (AWS, RF=3, 3 AZs); treat them as illustrative and version-dependent, not a quote, substitute your own measured ingress and your own cloud bill's transfer rates.

Assumptions. Each constant carries its value with units, a one-line why/source, and its kind (workload / config / illustrative cloud rate). Nothing in the derivation uses a number that is not listed here.

Symbol	Value (with units)	Why / source	Kind
ingress	100 MiB/s sustained	The workload we are pricing, pick the peak sustained produce rate your cluster measures (`BytesInPerSec`).	workload (illustrative)
fanout	3 consumer groups (3× read)	Each group re-reads the full stream once, so total consumer egress = 3 × ingress.	workload (illustrative)
RF	3 replicas	The durable standard; set via `default.replication.factor` (source default is 1, see Driver 1). RF=3 ⇒ `RF−1 = 2` follower copies.	config
AZs	3 availability zones	Brokers spread evenly ⇒ a client shares the leader's AZ ⅓ of the time, so `⅔` of produce & fetch crosses zones (Driver 2).	workload / topology
retention	3 days	How long the log is kept on disk; `retention.ms` (source default 7 days, `LogConfig.java:134`). 3 days chosen to keep the example small.	config (illustrative)
compression	none (1.0×) for the baseline	We price the untuned wire volume first, then apply compression as Lever 1. Producer & broker defaults compress nothing (Driver 1 gotcha).	config (baseline)
$_storage	$0.08 / GiB-month	AWS EBS gp3 (Confluent, empirical), illustrative; check your bill.	cloud rate (illustrative)
$_xAZ	$0.02 / GiB effective	AWS cross-AZ = $0.01/GiB each direction → ~$0.02 effective (AutoMQ / 2minutestreaming, empirical). GCP ≈ $0.01 once; Azure historically free.	cloud rate (illustrative)
$_compute	≈ $2,300 / month for the fleet	Broker instances sized to absorb `ingress × RF` ingress-replication plus fan-out reads; matches Confluent's published 100 MBps teardown compute line (empirical). Right-sizing is Capacity Planning; here it is a single illustrative line, not a per-broker derivation.	cloud rate (illustrative)
sec/month	2,592,000 s = 60×60×24×30	Unit conversion: seconds in a 30-day month (turns a per-second rate into bytes/month).	constant (exact)

Derived: monthly ingress volume. This single quantity feeds every line below, so derive it once:

Monthly ingress (derived from the assumptions)

ingress × sec/month ÷ 1024 = 100 MiB/s × 2,592,000 s ÷ 1024 MiB/GiB = 259,200,000 MiB ÷ 1024 ≈ 253,125 GiB ≈ 247 TiB/month. (Units cancel: (MiB/s)·s = MiB, then MiB ÷ (MiB/GiB) = GiB.) Call this I = 253,125 GiB/mo in the table below.

Derivation. Each row takes the formula from its Driver section, substitutes the assumptions above (so every literal traces to the Assumptions table), and multiplies by the relevant rate. Storage is a stock (bytes resident on disk), so it uses ingress-per-day × retention × RF; the cross-AZ rows are flows over the month, so they scale the monthly volume I. Units are shown so they cancel to dollars.

Line item	Derivation (formula → substitute assumptions → with units)	Bytes	× Rate	≈ Cost/mo
Storage (on disk)	`ingress/day × retention × RF`. ingress/day = `100 MiB/s × 86,400 s/day ÷ 1024 = 8,438 GiB/day`; resident = `8,438 GiB/day × 3 days × 3 (RF) = 75,938 GiB` (days & RF are dimensionless ⇒ result in GiB)	~76 TiB resident	`× $0.08/GiB-mo`	`75,938 × $0.08` = ~$6,100
Cross-AZ: replication	`I × (RF−1)` = `253,125 GiB × 2` (each of the 2 followers is in another AZ ⇒ all of it crosses)	~506,250 GiB (~494 TiB)	`× $0.02/GiB`	`506,250 × $0.02` = ~$10,100
Cross-AZ: produce	`I × ⅔` = `253,125 GiB × 0.67` (⅔ of writes hit a leader in another AZ, see Driver 2)	~169,594 GiB (~166 TiB)	`× $0.02/GiB`	`169,594 × $0.02` = ~$3,400
Cross-AZ: consumer fetch	`I × fanout × ⅔` = `253,125 GiB × 3 × 0.67` (3 groups, each reading from a leader that is cross-AZ ⅔ of the time)	~508,781 GiB (~497 TiB)	`× $0.02/GiB`	`508,781 × $0.02` = ~$10,200
Compute (brokers)	fleet sized to absorb `ingress × RF` ingress+replication and serve `fanout × ingress` reads, taken directly as the `$_compute` assumption (not re-derived here; see Capacity Planning)	,	fleet	~$2,300
Sum of the rows ≈				~$32,100/mo

Adding the three cross-AZ rows: $10,100 + $3,400 + $10,200 ≈ $23,700 of networking. So networking is $23,700 ÷ $32,100 ≈ 74% of the bill; compute is $2,300 ÷ $32,100 ≈ 7%; storage is the remaining ~19%. Why networking dominates: the cross-AZ multiplier (RF−1) + ⅔ + (fanout × ⅔) = 2 + 0.67 + 2.01 ≈ 4.7 applies to the monthly volume at $0.02/GiB each, whereas storage applies a 3× RF to only the resident tail (3 days, not the whole month) at $0.08/GiB-mo, so bytes-in-motion × ~4.7 beats bytes-at-rest × 3 even though the per-GiB storage rate is 4× higher. These figures align with Confluent's published 100 MBps teardown (~$24.2k networking / ~$14.5k storage / ~$2.3k compute); our storage line is lower only because this example assumes 3-day rather than longer retention. (Small differences from the round empirical figures, e.g. ~$10,100 vs the often-quoted ~$10,360, are pure rounding of the ⅔ probability and the $0.02 rate; carry more decimals if you need them to match a vendor sheet.) Now apply the levers in order and watch the bill collapse:

Baseline ≈ $32.1k/mo (networking ~74%): storage $6.1k + cross-AZ $23.7k + compute $2.3k

divide every byte-volume item by the 3× ratio (compute is unchanged): storage $6.1k÷3 ≈ $2.0k, cross-AZ $23.7k÷3 ≈ $7.9k, +compute $2.3k → total ≈ $12.2k

consumer cross-AZ term → ~$0 (was $10.2k÷3 ≈ $3.4k after L1): $12.2k − $3.4k → total ≈ $8.8k

on those topics, storage & replication scale ×(RF−1): 2→1 ⇒ ×½ each (linear in RF)

storage tier $0.08→$0.02/GiB-mo (~4–5× cheaper); networking now >80% of remainder

replication cross-AZ I×(RF−1) → ~0; new floor = produce + metadata

Levers compound. Compression multiplies everything down first (so apply it before computing the others), fetch-from-follower removes the consumer term, RF/retention trim linearly, tiered storage cheapens the at-rest bytes, and diskless removes the replication floor at a latency cost. *The 3× ratio is an illustrative assumption for JSON/text, between Cloudflare's measured zstd-6 4.5× and lz4 1.8× on HTTP data (empirical) and Trendyol's ~70% reduction (≈3.3×); it is data-dependent, so substitute your codec's measured ratio on your payloads. Only the byte-volume rows scale by it; the compute line does not.

baseline (untuned) compute / compression network lever storage lever architectural lever apply lever roadmap (KIP-1150, not yet GA)

Apply compression first, then recompute

Compression shrinks the byte volume that every other lever and formula operates on. If you compute cross-AZ cost on uncompressed ingress and then "save 50% with fetch-from-follower," you have double-counted. The correct order is: (1) measure compressed ingress (or model it with your codec's real ratio on your data), (2) compute the cross-AZ terms from that, (3) then subtract the consumer term that fetch-from-follower removes. The single biggest modelling error is pricing the bill on wire-format bytes that compression already eliminated.

The lever table, ranked by effort vs impact

Decision rule for picking the next lever: identify which line item dominates your bill (measure BytesInPerSec, ReplicationBytesInPerSec, and your cloud cross-AZ transfer report), then pick the lowest-effort lever that targets it.

#	Lever	Targets	Effort	Impact	Tradeoff / dial	Feature
1	`compression.type` = lz4/zstd	storage + ALL network + fetch	Low (one producer config)	High (~2–3× on JSON; ~10× on text)	Producer/consumer CPU; per-batch (needs batching). Dial: codec + `compression.zstd.level`.	Part I 01; KIP-390
2	Fetch-from-follower (rack-aware)	consumer cross-AZ (the ⅔×fanout term)	Low–Med (3 configs + correct rack ids)	High on read-heavy/multi-group clusters (Grab → $0)	+up to ~500 ms consumer latency (HW-bounded); broker load skew. Dial: on/off.	Part I 09; KIP-392
3	RF 3→2 / shorter retention	storage + replication cross-AZ (linear)	Low (per-topic config)	Med (−⅓ per RF step; linear in retention)	Durability/availability ↓ (RF=2 = no failure headroom); replay window ↓. Dials: `replication.factor`, `retention.ms`.	Part I 08
4	Tiered storage	storage only (cold bytes → object store)	Med (cluster + object-store setup)	Med–High on storage (~4–5× cheaper tier; 30–90% storage cut)	Remote-read latency for cold reads; object-store ops. No network effect. Dials: `remote.storage.enable`, `local.retention.ms`.	Part I 05; KIP-405
5	Diskless / object-store-native	replication + produce cross-AZ (the floor)	High (new topic type / vendor; not GA in OSS)	Very High on network (replication cross-AZ → ~0)	+200–400 ms produce latency, up to ~2.4 s e2e. Coexists with classic topics. Dial: per-topic.	KIP-1150 (accepted ~Mar 2026)

The one-paragraph cost playbook

Turn on compression everywhere (lz4 if latency-sensitive, zstd if bytes-sensitive) with sane batching, it is nearly free and cuts every byte-volume line item. Then, if cross-AZ dominates (it usually does), enable fetch-from-follower to kill the consumer term and revisit RF on non-critical topics. If storage is large, add tiered storage, but do not expect it to touch networking. Only reach for diskless when the produce+replication cross-AZ floor is the dominant cost and the workload tolerates 200–400 ms latency. Measure first: ReplicationBytesInPerSec tells you the amplification, your cloud transfer bill tells you the cross-AZ reality, and RemoteCopyBytesPerSec tells you tiering is working. Every number in this chapter is directional and version/region-dependent, re-check cloud transfer rates and vendor TCO claims against live pricing before you commit a budget.

Where this connects in Part II

This chapter is the economics layer over the rest of the operations manual. Capacity Planning sizes the fleet whose compute you are pricing here; Partitioning governs the per-partition counts that drive replica placement and therefore cross-AZ flows; Performance Tuning owns the batching/compression dials Lever 1 depends on; Durability Engineering owns the RF/min.insync.replicas/acks choices behind Lever 3; Topologies decides the AZ/cluster layout that sets the cross-AZ baseline; and Metrics & Signals gives the byte-rate gauges that turn this model from estimate into measurement. Cost is not a separate concern, it is the shadow every architectural choice casts on the invoice.