conduktor.io ↗

Ensure Kafka Producer Message Durability

Zero message loss with exactly-once semantics, idempotence, and transactions.

Recommended starting points curated by Conduktor. Always benchmark with your workload. Some broker configs are not available on managed services (AWS MSK, Confluent Cloud) — check your provider's documentation.

producer

Config Change Why
Delivery Guarantees
transactional.idcaution
Kafka 0.11.0+
null → "-" Setting a stable transactional.id enables the transactional API (beginTransaction/commitTransaction/abortTransaction), which provides atomic writes across multiple partitions and topics; consumers with isolation.level=read_committed see only committed records. The ID must be unique per producer instance and stable across restarts to allow epoch fencing of zombie producers.
• Transaction coordinator overhead adds ~5-10ms per commit. A crashed producer holding an open transaction blocks consumer progress on affected partitions until the transaction.timeout.ms expires. Each unique transactional.id consumes a slot in the __transaction_state topic.
delivery.timeout.mscaution
Kafka 2.1+
2min → 5min Extending the delivery window to 5 minutes accommodates extended broker unavailability (rolling restarts, leader elections, controller failover) without failing records; in a well-operated cluster this is never reached under normal conditions.
• Failed records hold buffer.memory for up to 5 minutes before surfacing as exceptions, which delays backpressure to the application. Must always be >= linger.ms + request.timeout.ms.
retry.backoff.ms
Kafka 0.8.0+
100ms → 200ms Doubling the initial retry backoff to 200ms reduces the rate of retry storms during leader elections on high-partition-count topics, giving the controller time to elect a new leader before the producer hammers the cluster with retries.
• Each retry attempt waits 200ms longer before re-attempting; with exponential backoff capped by retry.backoff.max.ms=5000 this has minimal effect on total recovery time but measurably reduces broker CPU during failover.
retry.backoff.max.ms
Kafka 3.7+
1s → 5s Capping exponential backoff at 5s instead of 1s prevents retry storms during extended broker outages while allowing fast recovery (first retry at 200ms, then exponential growth) once the broker returns. Within a 300s delivery window this yields ~40 retry attempts.
• Records experiencing a failure that resolves in 3-4s will wait longer between retries than necessary; the 5s cap is a conservative choice optimized for extended outages over fast recovery.
Batching & Compression
buffer.memorycaution
Kafka 0.8.1+
32MB → 64MB Doubling the accumulator buffer to 64MB provides headroom for records queued during a retry window (broker unavailability lasting 10-30s at typical rates); without this, buffer.memory exhausts and max.block.ms triggers, causing send() to throw before the retry window expires.
• Consumes 64MB of heap per producer instance; in containerized environments with strict memory limits this may require adjusting JVM heap or pod resource limits.