conduktor.io ↗

Minimize Kafka Broker Request Latency

Reduce broker-side queueing and replication lag for lower request latency.

Recommended starting points curated by Conduktor. Always benchmark with your workload. Some broker configs are not available on managed services (AWS MSK, Confluent Cloud) — check your provider's documentation.

broker

Config Change Why
Replication
replica.socket.receive.buffer.bytes
Kafka 0.8+
64KB → 4MB The replication socket receive buffer (64KB default) is SEPARATE from client socket buffers. On 10GbE networks, the TCP bandwidth-delay product exceeds 64KB — this silently caps replication throughput and causes under-replicated partitions.
• Each replica connection consumes 4MB of kernel memory. On clusters with many broker-to-broker connections this adds up.
replica.lag.time.max.mscaution
Kafka 0.9+
30s → 10s Detects slow followers in 10s instead of 30s, triggering ISR shrink faster so acks=all requests are not blocked waiting on a lagging replica.
• Aggressive shrink may cause unnecessary ISR flapping during GC pauses or minor network blips, triggering under-replicated partitions.
num.replica.fetchers
Kafka 0.8+
1 → 4 Parallelizes replication fetch across 4 threads, reducing replication lag which directly impacts acks=all produce latency.
• More fetcher threads increase network and CPU usage on both leader and follower brokers.
Timeouts & Sessions
leader.imbalance.check.interval.seconds
Kafka 0.9+
300s → 60s After rolling restarts or broker failures, partition leadership clusters on surviving brokers. The default 5-minute rebalance check window means throughput is unevenly distributed for up to 5 minutes. 60 seconds cuts recovery time 5x.
• More frequent checks add minor controller CPU overhead. On very large clusters (10k+ partitions) the leader rebalance operation itself can cause brief latency spikes.
Threading
num.network.threads
Kafka 0.8+
3 → 8 More network threads reduce request queueing time. With 3 threads, high-concurrency workloads queue behind each other adding p99 tail latency.
• Each thread consumes CPU. Diminishing returns beyond core count.
num.io.threads
Kafka 0.8+
8 → 16 More I/O threads parallelize disk reads for fetch requests, reducing time spent waiting for disk I/O especially on spinning disks or under heavy consumer load.
• More threads compete for disk I/O bandwidth. On SSDs with low latency, 8 threads may already saturate.
Network & Buffers
queued.max.requestscaution
Kafka 0.8+
500 → 200 Limits the request queue depth to bound worst-case queueing latency. With 500 queued requests, tail latency spikes when brokers are overloaded.
• Clients receive throttling errors (QUEUE_FULL) sooner under load. May reduce throughput during bursts.
socket.send.buffer.bytes
Kafka 0.8+
100KB → 64KB Smaller send buffer reduces buffering delay for small responses. The default 100KB buffer may delay small fetch responses while waiting to fill.
• Reduces throughput for large fetches where bigger buffers amortize syscall overhead.
socket.receive.buffer.bytes
Kafka 0.8+
100KB → 64KB Smaller receive buffer ensures requests are handed to the I/O thread pool faster instead of accumulating in the kernel buffer.
• May increase syscall frequency for large produce requests.