Kafka Exception TaskCorruptedException
org.apache.kafka.streams.errors.TaskCorruptedException
Non-retriable
Streams
Indicates a specific task is corrupted and need to be re-initialized. It can be thrown when: Under EOS, if the checkpoint file does not contain offsets for corresponding store's changelogs, meaning previously it was not close cleanly. Out-of-range exception thrown during restoration, meaning that the changelog has been modified and we re-bootstrap the store.
Common Causes
- Under EOS (exactly_once_v2), the checkpoint file lacks offsets for a store's changelog, meaning the previous run did not close cleanly (crash / kill -9 / OOM) — Streams must wipe and re-bootstrap the store from the changelog.
- An OffsetOutOfRangeException / InvalidOffsetException during restoration: the restore consumer's position fell outside the changelog partition's range because the changelog was truncated (retention) or compacted on the broker, so the task is marked corrupted.
- Under EOS, a transaction commit timed out (transaction.timeout.ms, default 60000ms) — per KAFKA-9274 Streams deliberately throws TaskCorruptedException instead of TimeoutException so the task is aborted and re-initialized cleanly.
- A network error / broker disconnect during an EOS transaction that prevents committing within the timeout window, triggering the abort-and-rebuild cycle.
Solutions
- Always shut down cleanly (KafkaStreams.close(Duration)) so checkpoints are written; under EOS an unclean exit can trigger this exception and a full state rebuild on restart.
- Increase transaction.timeout.ms (e.g. 60000) — but keep it <= broker max.transaction.timeout.ms — if commits time out under load/quota throttling so EOS transactions have room to complete.
- Ensure changelog retention/compaction won't truncate offsets you still need: changelogs should be compacted (cleanup.policy=compact); avoid manual deletion/recreation of changelog topics, which forces a re-bootstrap.
- Let Streams self-heal (it closes the task dirty, wipes the store under EOS, and revives into CREATED) but upgrade to >= 3.6.2 / 3.7.0 to avoid the KAFKA-16017 data-loss bug where a revived-then-migrated task checkpoints an offset higher than what was actually restored.
Example Stack Trace
org.apache.kafka.streams.errors.TaskCorruptedException: Tasks [0_2] are corrupted and hence need to be re-initialized
at org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:578)
at org.apache.kafka.streams.processor.internals.TaskManager.tryToCompleteRestoration(TaskManager.java:660)
at org.apache.kafka.streams.processor.internals.StreamThread.initializeAndRestorePhase(StreamThread.java:902)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:757)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:617)
Caused by: org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Fetch position FetchPosition{offset=15233} is out of range for partition my-app-store-changelog-2Diagnostic Commands
kafka-get-offsets.sh --bootstrap-server localhost:9092 --topic my-app-store-changelog --time -2 # earliest offset vs the out-of-range position in the trace
kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic my-app-store-changelog # verify cleanup.policy=compact and retention settings
kafka-streams-application-reset.sh --application-id my-app --bootstrap-server localhost:9092 # plus delete local state.dir to force a clean rebuild after corruptionRelated
Related Streams exceptions: BrokerNotFoundException · InternalTopicsAlreadySetupException · InvalidStateStoreException · InvalidStateStorePartitionException · LockException · MisconfiguredInternalTopicException · MissingInternalTopicsException · MissingSourceTopicException
Hitting
TaskCorruptedException in production? Conduktor Console gives you real-time visibility into clients, consumer groups, and broker health. Browse every Kafka exception or protocol error code.