KIP-501 — Avoid out-of-sync or offline partitions when follower fetch requests not processed in time
Discussion Broker
KIP-501 proposes tracking the last time a follower sent a fetch request separately from whether it has caught up in bytes, so the ISR eviction timer (`replica.lag.time.max.ms`) is not reset only by fetch receipt but also requires the leader to have actually processed the fetch within that window. When a leader's I/O is slow (GC pause, disk stall), it may not service a follower's fetch request within `replica.lag.time.max.ms` even though the follower is actively fetching, causing ISR shrinkage and potential offline partitions that are not caused by follower issues.
Details
| Author | Satish Duggana |
| Status | Discussion |
| JIRA | KAFKA-8733 |
| Wiki | View on Apache Wiki |
| Created | 2019-07-30 |
| Last Modified | 2021-06-29 |
Explore how this KIP affects the Kafka protocol in the Protocol Explorer, or see the full KIP database.