Ensure Kafka Producer Message Durability
Zero message loss with exactly-once semantics, idempotence, and transactions.
Recommended starting points curated by Conduktor. Always benchmark with your workload. Some broker configs are not available on managed services (AWS MSK, Confluent Cloud) — check your provider's documentation.
producer
| Config | Change | Why |
|---|---|---|
| Delivery Guarantees | ||
|
transactional.idcaution
Kafka 0.11.0+
|
null → " |
Setting a stable transactional.id enables the transactional API (beginTransaction/commitTransaction/abortTransaction), which provides atomic writes across multiple partitions and topics; consumers with isolation.level=read_committed see only committed records. The ID must be unique per producer instance and stable across restarts to allow epoch fencing of zombie producers. • Transaction coordinator overhead adds ~5-10ms per commit. A crashed producer holding an open transaction blocks consumer progress on affected partitions until the transaction.timeout.ms expires. Each unique transactional.id consumes a slot in the __transaction_state topic. |
|
delivery.timeout.mscaution
Kafka 2.1+
|
2min → 5min | Extending the delivery window to 5 minutes accommodates extended broker unavailability (rolling restarts, leader elections, controller failover) without failing records; in a well-operated cluster this is never reached under normal conditions. • Failed records hold buffer.memory for up to 5 minutes before surfacing as exceptions, which delays backpressure to the application. Must always be >= linger.ms + request.timeout.ms. |
|
retry.backoff.ms
Kafka 0.8.0+
|
100ms → 200ms | Doubling the initial retry backoff to 200ms reduces the rate of retry storms during leader elections on high-partition-count topics, giving the controller time to elect a new leader before the producer hammers the cluster with retries. • Each retry attempt waits 200ms longer before re-attempting; with exponential backoff capped by retry.backoff.max.ms=5000 this has minimal effect on total recovery time but measurably reduces broker CPU during failover. |
|
retry.backoff.max.ms
Kafka 3.7+
|
1s → 5s | Capping exponential backoff at 5s instead of 1s prevents retry storms during extended broker outages while allowing fast recovery (first retry at 200ms, then exponential growth) once the broker returns. Within a 300s delivery window this yields ~40 retry attempts. • Records experiencing a failure that resolves in 3-4s will wait longer between retries than necessary; the 5s cap is a conservative choice optimized for extended outages over fast recovery. |
| Batching & Compression | ||
|
buffer.memorycaution
Kafka 0.8.1+
|
32MB → 64MB | Doubling the accumulator buffer to 64MB provides headroom for records queued during a retry window (broker unavailability lasting 10-30s at typical rates); without this, buffer.memory exhausts and max.block.ms triggers, causing send() to throw before the retry window expires. • Consumes 64MB of heap per producer instance; in containerized environments with strict memory limits this may require adjusting JVM heap or pod resource limits. |