Minimize Kafka Broker Request Latency
Reduce broker-side queueing and replication lag for lower request latency.
Recommended starting points curated by Conduktor. Always benchmark with your workload. Some broker configs are not available on managed services (AWS MSK, Confluent Cloud) — check your provider's documentation.
broker
| Config | Change | Why |
|---|---|---|
| Replication | ||
|
replica.socket.receive.buffer.bytes
Kafka 0.8+
|
64KB → 4MB | The replication socket receive buffer (64KB default) is SEPARATE from client socket buffers. On 10GbE networks, the TCP bandwidth-delay product exceeds 64KB — this silently caps replication throughput and causes under-replicated partitions. • Each replica connection consumes 4MB of kernel memory. On clusters with many broker-to-broker connections this adds up. |
|
replica.lag.time.max.mscaution
Kafka 0.9+
|
30s → 10s | Detects slow followers in 10s instead of 30s, triggering ISR shrink faster so acks=all requests are not blocked waiting on a lagging replica. • Aggressive shrink may cause unnecessary ISR flapping during GC pauses or minor network blips, triggering under-replicated partitions. |
|
num.replica.fetchers
Kafka 0.8+
|
1 → 4 | Parallelizes replication fetch across 4 threads, reducing replication lag which directly impacts acks=all produce latency. • More fetcher threads increase network and CPU usage on both leader and follower brokers. |
| Timeouts & Sessions | ||
|
leader.imbalance.check.interval.seconds
Kafka 0.9+
|
300s → 60s | After rolling restarts or broker failures, partition leadership clusters on surviving brokers. The default 5-minute rebalance check window means throughput is unevenly distributed for up to 5 minutes. 60 seconds cuts recovery time 5x. • More frequent checks add minor controller CPU overhead. On very large clusters (10k+ partitions) the leader rebalance operation itself can cause brief latency spikes. |
| Threading | ||
|
num.network.threads
Kafka 0.8+
|
3 → 8 | More network threads reduce request queueing time. With 3 threads, high-concurrency workloads queue behind each other adding p99 tail latency. • Each thread consumes CPU. Diminishing returns beyond core count. |
|
num.io.threads
Kafka 0.8+
|
8 → 16 | More I/O threads parallelize disk reads for fetch requests, reducing time spent waiting for disk I/O especially on spinning disks or under heavy consumer load. • More threads compete for disk I/O bandwidth. On SSDs with low latency, 8 threads may already saturate. |
| Network & Buffers | ||
|
queued.max.requestscaution
Kafka 0.8+
|
500 → 200 | Limits the request queue depth to bound worst-case queueing latency. With 500 queued requests, tail latency spikes when brokers are overloaded. • Clients receive throttling errors (QUEUE_FULL) sooner under load. May reduce throughput during bursts. |
|
socket.send.buffer.bytes
Kafka 0.8+
|
100KB → 64KB | Smaller send buffer reduces buffering delay for small responses. The default 100KB buffer may delay small fetch responses while waiting to fill. • Reduces throughput for large fetches where bigger buffers amortize syscall overhead. |
|
socket.receive.buffer.bytes
Kafka 0.8+
|
100KB → 64KB | Smaller receive buffer ensures requests are handed to the I/O thread pool faster instead of accumulating in the kernel buffer. • May increase syscall frequency for large produce requests. |