Kafka Error STALE_CONTROLLER_EPOCH
Error code 11 · Non-retriable KRaft
The controller moved to another broker.
Common Causes
- Active controller changed during failover or rolling restart, and a broker/controller request was sent with the previous controller epoch
- Controller quorum instability or a network interruption triggered a new controller election before all brokers had processed the previous epoch
- A broker was slow to process controller metadata updates and kept sending controller-scoped requests with stale epoch information
Solutions
- Treat this as a controller failover symptom, not a client data-plane problem. Confirm exactly one active controller and let brokers refresh controller metadata
- If it repeats, inspect controller election history and quorum health rather than retry settings. Frequent controller changes point to controller JVM, disk, or network instability
- On KRaft clusters, verify controller listeners and controller.quorum.voters are consistent on all nodes; on ZooKeeper clusters, verify the controller broker can maintain ZooKeeper connectivity
Diagnostic Commands
# Look for controller election events in logs
grep -E 'Elected as controller|Resigned as controller|StaleControllerEpoch' /var/log/kafka/server.log /var/log/kafka/controller.log 2>/dev/null | tail -20
# Check quorum or controller status
kafka-metadata-quorum.sh --bootstrap-server localhost:9092 describe --status 2>/dev/null || true
Debugging Kafka errors? Conduktor Console gives you real-time visibility into your cluster. Explore all errors in the Error Decoder.