Minimize Kafka Broker Request Latency

Reduce broker-side queueing and replication lag for lower request latency.

Recommended starting points curated by Conduktor. Always benchmark with your workload. Some broker configs are not available on managed services (AWS MSK, Confluent Cloud) — check your provider's documentation.

broker

Config	Change	Why
Replication
replica.socket.receive.buffer.bytes Kafka 0.8+	64KB → 4MB	The replication socket receive buffer (64KB default) is SEPARATE from client socket buffers. On 10GbE networks, the TCP bandwidth-delay product exceeds 64KB — this silently caps replication throughput and causes under-replicated partitions. • Each replica connection consumes 4MB of kernel memory. On clusters with many broker-to-broker connections this adds up.
replica.lag.time.max.mscaution ⚠ Kafka 0.9+	30s → 10s	Detects slow followers in 10s instead of 30s, triggering ISR shrink faster so acks=all requests are not blocked waiting on a lagging replica. • Aggressive shrink may cause unnecessary ISR flapping during GC pauses or minor network blips, triggering under-replicated partitions.
num.replica.fetchers ⚠ Kafka 0.8+	1 → 4	Parallelizes replication fetch across 4 threads, reducing replication lag which directly impacts acks=all produce latency. • More fetcher threads increase network and CPU usage on both leader and follower brokers.
Timeouts & Sessions
leader.imbalance.check.interval.seconds Kafka 0.9+	300s → 60s	After rolling restarts or broker failures, partition leadership clusters on surviving brokers. The default 5-minute rebalance check window means throughput is unevenly distributed for up to 5 minutes. 60 seconds cuts recovery time 5x. • More frequent checks add minor controller CPU overhead. On very large clusters (10k+ partitions) the leader rebalance operation itself can cause brief latency spikes.
Threading
num.network.threads ⚠ Kafka 0.8+	3 → 8	More network threads reduce request queueing time. With 3 threads, high-concurrency workloads queue behind each other adding p99 tail latency. • Each thread consumes CPU. Diminishing returns beyond core count.
num.io.threads ⚠ Kafka 0.8+	8 → 16	More I/O threads parallelize disk reads for fetch requests, reducing time spent waiting for disk I/O especially on spinning disks or under heavy consumer load. • More threads compete for disk I/O bandwidth. On SSDs with low latency, 8 threads may already saturate.
Network & Buffers
queued.max.requestscaution ⚠ Kafka 0.8+	500 → 200	Limits the request queue depth to bound worst-case queueing latency. With 500 queued requests, tail latency spikes when brokers are overloaded. • Clients receive throttling errors (QUEUE_FULL) sooner under load. May reduce throughput during bursts.
socket.send.buffer.bytes ⚠ Kafka 0.8+	100KB → 64KB	Smaller send buffer reduces buffering delay for small responses. The default 100KB buffer may delay small fetch responses while waiting to fill. • Reduces throughput for large fetches where bigger buffers amortize syscall overhead.
socket.receive.buffer.bytes ⚠ Kafka 0.8+	100KB → 64KB	Smaller receive buffer ensures requests are handed to the I/O thread pool faster instead of accumulating in the kernel buffer. • May increase syscall frequency for large produce requests.