Kafka Exception TaskMigratedException
org.apache.kafka.streams.errors.TaskMigratedException
Non-retriable
Streams
Indicates that all tasks belongs to the thread have migrated to another thread. This exception can be thrown when the thread gets fenced (either by the consumer coordinator or by the transaction coordinator), which means it is no longer part of the group but a "zombie" already
Common Causes
- A StreamThread misses a rebalance and is fenced by the consumer group coordinator: processing of a poll batch exceeds `max.poll.interval.ms` (slow processing, large `max.poll.records`, long GC pause), the thread drops out of the group, its tasks are reassigned, and the now-zombie thread throws TaskMigratedException ('detected that the thread is being fenced').
- Producer fencing under exactly-once (EOS): a new instance/thread initializes a transactional producer with the same transactional.id and bumps the epoch, so the old producer's commit/send is rejected with ProducerFencedException / InvalidProducerEpochException, which Streams wraps as TaskMigratedException.
- Transaction timeout during idle periods with EOS: a transaction opened by a punctuator (e.g. deleting state-store records) is aborted by the broker after `transaction.timeout.ms` (~60s default) when no new records arrive, so the next send finds the producer fenced (the classic Felix D'Souza scenario).
- Continuous rebalancing / restore loops on newer versions (e.g. KAFKA-17445 on 3.8): with static membership and low `acceptable.recovery.lag`, tasks land on few instances that keep getting fenced and restoring without ever processing, so TaskMigratedException repeats indefinitely.
Solutions
- Reduce per-batch processing time: lower `max.poll.records` and/or raise `max.poll.interval.ms` so a poll loop can't blow past the limit (recent Streams default it to Integer.MAX_VALUE). Investigate and fix long GC pauses and CPU starvation.
- For EOS, raise `producer.transaction.timeout.ms` (bounded by broker `transaction.max.timeout.ms`) so transactions opened by punctuators don't expire during idle windows, and prefer `processing.guarantee=exactly_once_v2` (EOS v2) which shares one producer per thread.
- Treat a single TaskMigratedException as benign and recoverable — Streams closes the zombie task and rejoins automatically. Only act when it loops. Register a `StreamsUncaughtExceptionHandler` (REPLACE_THREAD) for resilience rather than crashing the app.
- If stuck in a rebalance/restore loop (KAFKA-17445 style), revisit static membership (`group.instance.id`), `session.timeout.ms`/`heartbeat.interval.ms`, and `acceptable.recovery.lag`; upgrade to a version containing the relevant fixes (e.g. KAFKA-7284 unwrapping, KAFKA-16903 cross-task error propagation).
Example Stack Trace
org.apache.kafka.streams.errors.TaskMigratedException: Producer got fenced trying to commit a transaction [stream-thread [my-app-StreamThread-1]]; it means all tasks belonging to this thread should be migrated.
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:322)
at org.apache.kafka.streams.processor.internals.StreamsProducer.commitTransaction(StreamsProducer.java:307)
at org.apache.kafka.streams.processor.internals.TaskManager.commitOffsetsOrTransaction(TaskManager.java:1314)
at org.apache.kafka.streams.processor.internals.TaskExecutor.commitOffsetsOrTransaction(TaskExecutor.java:171)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnceWithProcessingThreads(StreamThread.java:917)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:711)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:671)
Caused by: org.apache.kafka.common.errors.InvalidProducerEpochException: Producer attempted to produce with an old epoch.Diagnostic Commands
kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group <application.id> # check for frequent generation/owner churn and lag
grep -E 'TaskMigrated|being fenced|got migrated|InvalidProducerEpoch|ProducerFenced' streams.log # confirm fencing vs. timeout vs. loop
kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers --all --describe | grep transaction.max.timeout.ms # ceiling for producer transaction.timeout.msRelated
Related Streams exceptions: BrokerNotFoundException · InternalTopicsAlreadySetupException · InvalidStateStoreException · InvalidStateStorePartitionException · LockException · MisconfiguredInternalTopicException · MissingInternalTopicsException · MissingSourceTopicException
Hitting
TaskMigratedException in production? Conduktor Console gives you real-time visibility into clients, consumer groups, and broker health. Browse every Kafka exception or protocol error code.