What causes TaskCorruptedException?

Under EOS (exactly_once_v2), the checkpoint file lacks offsets for a store's changelog, meaning the previous run did not close cleanly (crash / kill -9 / OOM) — Streams must wipe and re-bootstrap the store from the changelog. An OffsetOutOfRangeException / InvalidOffsetException during restoration: the restore consumer's position fell outside the changelog partition's range because the changelog was truncated (retention) or compacted on the broker, so the task is marked corrupted. Under EOS, a transaction commit timed out (transaction.timeout.ms, default 60000ms) — per KAFKA-9274 Streams deliberately throws TaskCorruptedException instead of TimeoutException so the task is aborted and re-initialized cleanly. A network error / broker disconnect during an EOS transaction that prevents committing within the timeout window, triggering the abort-and-rebuild cycle.

How do I fix TaskCorruptedException?

Always shut down cleanly (KafkaStreams.close(Duration)) so checkpoints are written; under EOS an unclean exit can trigger this exception and a full state rebuild on restart. Increase transaction.timeout.ms (e.g. 60000) — but keep it <= broker max.transaction.timeout.ms — if commits time out under load/quota throttling so EOS transactions have room to complete. Ensure changelog retention/compaction won't truncate offsets you still need: changelogs should be compacted (cleanup.policy=compact); avoid manual deletion/recreation of changelog topics, which forces a re-bootstrap. Let Streams self-heal (it closes the task dirty, wipes the store under EOS, and revives into CREATED) but upgrade to >= 3.6.2 / 3.7.0 to avoid the KAFKA-16017 data-loss bug where a revived-then-migrated task checkpoints an offset higher than what was actually restored.

Is TaskCorruptedException retriable?

No — TaskCorruptedException is not retriable. Retrying the same call will not help; the underlying cause must be fixed first.

Kafka Exception `TaskCorruptedException`

org.apache.kafka.streams.errors.TaskCorruptedException

Non-retriable Streams

Indicates a specific task is corrupted and need to be re-initialized. It can be thrown when: Under EOS, if the checkpoint file does not contain offsets for corresponding store's changelogs, meaning previously it was not close cleanly. Out-of-range exception thrown during restoration, meaning that the changelog has been modified and we re-bootstrap the store.

Common Causes

Under EOS (exactly_once_v2), the checkpoint file lacks offsets for a store's changelog, meaning the previous run did not close cleanly (crash / kill -9 / OOM) — Streams must wipe and re-bootstrap the store from the changelog.
An OffsetOutOfRangeException / InvalidOffsetException during restoration: the restore consumer's position fell outside the changelog partition's range because the changelog was truncated (retention) or compacted on the broker, so the task is marked corrupted.
Under EOS, a transaction commit timed out (transaction.timeout.ms, default 60000ms) — per KAFKA-9274 Streams deliberately throws TaskCorruptedException instead of TimeoutException so the task is aborted and re-initialized cleanly.
A network error / broker disconnect during an EOS transaction that prevents committing within the timeout window, triggering the abort-and-rebuild cycle.

Solutions

Always shut down cleanly (KafkaStreams.close(Duration)) so checkpoints are written; under EOS an unclean exit can trigger this exception and a full state rebuild on restart.
Increase transaction.timeout.ms (e.g. 60000) — but keep it <= broker max.transaction.timeout.ms — if commits time out under load/quota throttling so EOS transactions have room to complete.
Ensure changelog retention/compaction won't truncate offsets you still need: changelogs should be compacted (cleanup.policy=compact); avoid manual deletion/recreation of changelog topics, which forces a re-bootstrap.
Let Streams self-heal (it closes the task dirty, wipes the store under EOS, and revives into CREATED) but upgrade to >= 3.6.2 / 3.7.0 to avoid the KAFKA-16017 data-loss bug where a revived-then-migrated task checkpoints an offset higher than what was actually restored.

Example Stack Trace

org.apache.kafka.streams.errors.TaskCorruptedException: Tasks [0_2] are corrupted and hence need to be re-initialized
	at org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:578)
	at org.apache.kafka.streams.processor.internals.TaskManager.tryToCompleteRestoration(TaskManager.java:660)
	at org.apache.kafka.streams.processor.internals.StreamThread.initializeAndRestorePhase(StreamThread.java:902)
	at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:757)
	at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:617)
Caused by: org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Fetch position FetchPosition{offset=15233} is out of range for partition my-app-store-changelog-2

Diagnostic Commands

kafka-get-offsets.sh --bootstrap-server localhost:9092 --topic my-app-store-changelog --time -2   # earliest offset vs the out-of-range position in the trace

kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic my-app-store-changelog   # verify cleanup.policy=compact and retention settings

kafka-streams-application-reset.sh --application-id my-app --bootstrap-server localhost:9092   # plus delete local state.dir to force a clean rebuild after corruption

Hitting TaskCorruptedException in production? Conduktor Console gives you real-time visibility into clients, consumer groups, and broker health. Browse every Kafka exception or protocol error code.

Kafka Exception TaskCorruptedException

Common Causes

Solutions

Example Stack Trace

Diagnostic Commands

Related

Kafka Exception `TaskCorruptedException`