conduktor.io ↗

KIP-501 — Avoid out-of-sync or offline partitions when follower fetch requests not processed in time

Discussion Broker

KIP-501 proposes tracking the last time a follower sent a fetch request separately from whether it has caught up in bytes, so the ISR eviction timer (`replica.lag.time.max.ms`) is not reset only by fetch receipt but also requires the leader to have actually processed the fetch within that window. When a leader's I/O is slow (GC pause, disk stall), it may not service a follower's fetch request within `replica.lag.time.max.ms` even though the follower is actively fetching, causing ISR shrinkage and potential offline partitions that are not caused by follower issues.

Details

AuthorSatish Duggana
StatusDiscussion
JIRAKAFKA-8733
WikiView on Apache Wiki
Created2019-07-30
Last Modified2021-06-29
Explore how this KIP affects the Kafka protocol in the Protocol Explorer, or see the full KIP database.