Maximize Kafka Broker I/O Throughput
Maximize broker I/O capacity with thread tuning, larger buffers, and replication parallelism.
Recommended starting points curated by Conduktor. Always benchmark with your workload. Some broker configs are not available on managed services (AWS MSK, Confluent Cloud) — check your provider's documentation.
broker
| Config | Change | Why |
|---|---|---|
| Threading | ||
|
num.network.threads
Kafka 0.8.0+
|
3 → 8 | Each network thread handles socket reads/writes for incoming client and inter-broker requests. At high connection counts (thousands of producers/consumers), 3 threads become a serialization bottleneck; 8 threads map roughly to physical CPU cores on a typical 16-core broker and keep network queues from stacking up. • More threads consume CPU and memory; on brokers with few cores or on lightly loaded clusters the extra context-switching overhead outweighs the gain. |
|
num.io.threads
Kafka 0.8.0+
|
8 → 16 | IO threads perform the actual disk reads/writes for produce and fetch requests. When using HDDs or when the broker hosts many partitions, IO operations queue up; doubling to 16 keeps disk utilization high without stalling the network layer. • On SSDs with low seek latency, 8 threads is already sufficient and the extra threads add scheduler overhead. Profile with iostat before increasing. |
|
background.threads
Kafka 0.8.1+
|
10 → 20 | Background threads handle log segment deletion, index truncation, and leader election callbacks. Under high write rates with frequent segment rolls, the deletion queue can back up with 10 threads, causing disk usage to spike while segments await deletion. • Each additional background thread competes for CPU during peak produce windows. Monitor thread-pool utilization before increasing. |
|
num.recovery.threads.per.data.dir
Kafka 0.8+
|
2 → 4 | Doubles log recovery threads from 2 to 4 per data directory, parallelizing startup recovery and shutdown flushing. With thousands of partitions per data dir, 2 threads can bottleneck broker restart time. • More threads compete for disk I/O during recovery. On SSDs the gain is significant; on spinning disks with a single data dir, diminishing returns beyond 2 threads. |
| Network & Buffers | ||
|
queued.max.requests
Kafka 0.8.0+
|
500 → 1000 | The request queue depth between network and IO threads. Under burst traffic the default 500 can fill in milliseconds, causing the network layer to drop new connections. Doubling it absorbs micro-bursts without connection rejections. • A larger queue means more in-memory request objects; at 1000 entries with large payloads broker heap pressure increases. Watch GC metrics. |
|
socket.send.buffer.bytescaution
Kafka 0.8.0+
|
100KB → 1MB | Increasing the OS TCP send buffer to 1MB allows the kernel to buffer more outgoing fetch-response bytes, critical on high-latency links or when consumers are remote. The bandwidth-delay product on a 1Gbps link at 1ms RTT is already 125KB — larger buffers avoid TCP stalls. • Kernel memory consumption scales with number of active connections × buffer size; on high fan-out clusters with thousands of consumers this can consume several GB of kernel memory. |
|
socket.receive.buffer.bytescaution
Kafka 0.8.0+
|
100KB → 1MB | A larger receive buffer allows the kernel to buffer more incoming ProduceRequest bytes before the broker reads them, preventing TCP flow control back-pressure on producers during brief IO thread stalls. • Same kernel memory scaling concern as socket.send.buffer.bytes. On a cluster with 10k producer connections and 1MB buffers, kernel reserves 10GB. |
|
message.max.bytescaution
Kafka 0.8.0+
|
1MB → 5MB | Raising the max message size to 5MB allows producers to send large batches in a single request, reducing request overhead per MB of data. Combined with producer-side lz4 compression, effective payload per request can reach 20-40MB. • Affects memory allocation on every broker handling the partition; also requires replica.fetch.max.bytes >= this value or replication stalls permanently on any message exceeding the old limit. |
| Replication | ||
|
num.replica.fetchers
Kafka 0.8.0+
|
1 → 4 | Each replica fetcher thread replicates data from the leader for a subset of partitions. With a single thread and hundreds of partitions, replication lag accumulates, causing followers to fall out of ISR and degrading write availability. 4 threads parallelize replication across partitions. • More fetcher threads mean more concurrent TCP connections to leader brokers and increased network bandwidth used for replication; on bandwidth-constrained clusters this competes with client traffic. |
|
replica.fetch.max.bytescaution
Kafka 0.8.0+
|
1MB → 10MB | Increasing the per-fetch-request byte limit to 10MB allows follower fetchers to pull more data per round-trip, reducing replication lag at high produce rates. The default 1MB cap causes replication round-trips to dominate at >100MB/s ingestion. • Must be >= message.max.bytes, otherwise a single oversized message permanently stalls replication. Also increases per-request memory allocation on the follower. |
|
replica.socket.receive.buffer.bytes
Kafka 0.8+
|
64KB → 4MB | The replication socket receive buffer (64KB default) is SEPARATE from client socket buffers. On 10GbE networks, the TCP bandwidth-delay product exceeds 64KB — this silently caps replication throughput and causes under-replicated partitions. • Each replica connection consumes 4MB of kernel memory. On clusters with many broker-to-broker connections this adds up. |
| Log Segments & Compaction | ||
|
log.segment.bytes
Kafka 0.8.0+
|
1GB → 512MB | Reducing segments to 512MB means segments roll more frequently, enabling faster log compaction and retention enforcement without accumulating 1GB of data per partition before any cleanup can happen. At very high write rates, smaller segments reduce the window of data at risk if a single segment is corrupt. • More segment files mean more file handles and more index files; on brokers with thousands of partitions, cutting segment size in half doubles the file descriptor count. |
|
log.cleaner.io.max.bytes.per.secondcaution
Kafka 0.9+
|
unlimited → 50MB/s | Throttles log compaction I/O to 50MB/s. The default is uncapped (Double.MAX_VALUE) — compaction can fully saturate broker I/O, causing produce and fetch latency spikes for all other topics on the same broker. This is a common mystery performance degradation. • Compaction runs slower, so compacted topics take longer to reclaim space. Monitor log-cleaner metrics for compaction backlog. |
| Partitioning | ||
|
num.partitionscaution
Kafka 0.7+
|
1 → 12 | Auto-created topics default to 1 partition, which serializes all writes through a single leader. Setting 12 partitions enables parallel produce and consume across 12 independent log segments, directly multiplying throughput. 12 is evenly divisible by 1, 2, 3, 4, 6 — good fit for most consumer group sizes. • Each partition adds metadata overhead (~1KB/partition in controller memory) and a minimum of 2 file handles; 10,000 topics × 12 partitions = 120,000 partitions, which strains controller and broker JVM heap. |
| Timeouts & Sessions | ||
|
leader.imbalance.check.interval.seconds
Kafka 0.9+
|
300s → 60s | After rolling restarts or broker failures, partition leadership clusters on surviving brokers. The default 5-minute rebalance check window means throughput is unevenly distributed for up to 5 minutes. 60 seconds cuts recovery time 5x. • More frequent checks add minor controller CPU overhead. On very large clusters (10k+ partitions) the leader rebalance operation itself can cause brief latency spikes. |
| Fetching | ||
|
max.incremental.fetch.session.cache.slots
Kafka 2.0+
|
1000 → 2000 | Incremental fetch sessions (KIP-227) let consumers send O(partition_changes) per FetchRequest instead of O(all_partitions). When cache is full (>1000 active consumers), sessions are evicted and consumers fall back to full enumeration — wasting network bandwidth. • Each cache slot uses a small amount of broker memory. Memory impact is negligible for 2000 slots. |