Maximize Kafka Producer Write Throughput
Maximize producer write throughput with batching, compression, and network tuning.
Recommended starting points curated by Conduktor. Always benchmark with your workload. Some broker configs are not available on managed services (AWS MSK, Confluent Cloud) — check your provider's documentation.
producer
| Config | Change | Why |
|---|---|---|
| Batching & Compression | ||
|
batch.size
Kafka 0.8.1+
|
16KB → 128KB | Increases batch capacity from 16KB to 128KB, allowing the producer to pack 8x more data per ProduceRequest and amortize per-request overhead across more records. • Each batch occupies 128KB of buffer.memory; if messages are small and low-rate, batches may never fill, gaining nothing while wasting memory. |
|
linger.ms
Kafka 0.8.1+
|
5ms → 20ms | Allows the accumulator to wait 20ms for more records before sending, enabling batch.size to be reached and reducing the number of ProduceRequests by up to 10x under moderate load. • Adds up to 20ms of end-to-end latency on every message path, unacceptable for latency-sensitive workloads like request-reply patterns. |
|
compression.type
Kafka 0.8.1+
|
none → lz4 | LZ4 compresses text/JSON payloads by 3-5x at near-memory-bandwidth speed (~500MB/s on modern CPUs), dramatically reducing network bytes and broker write amplification with minimal CPU overhead. • CPU usage on the producer increases; for already-compressed binary payloads (images, Avro with DEFLATE) compression may increase size and waste cycles. |
|
compression.lz4.level
Kafka 3.8+
|
9 → 1 | LZ4 level 9 is the maximum compression but 6-8x slower than level 1; level 1 still achieves 70-80% of the compression ratio at >400MB/s throughput, optimal for high-throughput pipelines. • Slightly larger compressed batches (10-20% bigger than level 9) mean more network bytes and broker storage per record. |
|
buffer.memorycaution
Kafka 0.8.1+
|
32MB → 128MB | Expanding the accumulator buffer from 32MB to 128MB prevents send() from blocking during traffic bursts, sustaining throughput without back-pressure stalls when brokers are temporarily slower. • Consumes 128MB of JVM heap per producer instance; with many concurrent producers or constrained containers this can trigger GC pressure or OOMs. |
| Network & Buffers | ||
|
max.request.sizecaution
Kafka 0.8.1+
|
1MB → 5MB | Raising the max request size to 5MB allows larger compressed batches to be sent in a single ProduceRequest, reducing round-trips and fully exploiting batch.size=128KB at high message rates. • Must match or be lower than broker's message.max.bytes and topic's max.message.bytes, otherwise requests are rejected with RecordTooLargeException. |
|
send.buffer.bytescaution
Kafka 0.8.0+
|
128KB → 1MB | Increasing the TCP send buffer to 1MB allows the OS to buffer more in-flight bytes without producer blocking, especially critical on high-latency links (>5ms RTT) where the bandwidth-delay product exceeds 128KB. • Each broker connection consumes 1MB of kernel socket buffer; with many partitions and connections this multiplies to gigabytes of kernel memory consumed. |
|
receive.buffer.bytes
Kafka 0.8.1+
|
32KB → 256KB | A larger receive buffer (256KB) ensures ProduceResponse and metadata responses from brokers are read in fewer syscalls, reducing latency of the acknowledgment path and freeing the sender thread faster. • Additional kernel memory per connection; modest cost but multiplies with large broker fan-out. |
| Delivery Guarantees | ||
|
delivery.timeout.mscaution
Kafka 2.1+
|
2min → 5min | Extending the delivery timeout to 5 minutes gives more headroom for retries during rolling restarts or leader elections without requiring operator intervention, preventing spurious TimeoutExceptions at scale. • Failed or stalled records tie up buffer.memory for up to 5 minutes before being surfaced as errors; this delays backpressure signals to the application. |
|
retry.backoff.max.ms
Kafka 3.7+
|
1s → 5s | Capping exponential backoff at 5s instead of 1s prevents retry storms during extended broker outages while still retrying frequently enough to recover quickly once the broker returns. • Each failed batch waits up to 5s between retries; in a 300s delivery.timeout.ms window this allows ~60 retry attempts, acceptable for most durability requirements. |