Hung Kafka Transactions

December 8, 2023
Karl Sorensen

In a previous post (Read/Process/Write with Kafka Transactions) I talked about how transactions work within Kafka.

Recently I again got to look into what happens when it goes wrong and you end up with hung transactions.

Errors with Kafka transactions leading to hung transactions are nothing new although much less common now as clients and the backend have grown more tolerant of issues.

The way a hung transaction normally makes itself known was that you end up with consumers that simply refuse to consume any of their lag and yet aren’t throwing any errors or reporting any issues!

As a quick test you can run a simple console consumer at the point where the consumer appears to be stuck and then set the isolation level to either read_committed or read_uncommitted to see if any records get returned

With read_committed (transaction aware) you wont get any records,

bin/kafka-console-consumer — bootstrap-server `hostname`:9092 — topic <topicname> — partition <partition> — offset <LSO> — isolation-level read_committed — max-messages 2

but with read_uncommitted (transaction unaware) you will get records returned

bin/kafka-console-consumer — bootstrap-server `hostname`:9092 — topic <topicname> — partition <partition> — offset <LSO> — isolation-level read_uncommitted — max-messages 2

So the question is, what do you do when you get a stuck transaction.

In the early days of Kafka there wasn’t a lot you could do.

There was a third party tool kcl that added support for unsticking the Last Stable Offset (LSO) by forcing an NewEndTxnRequest for the transaction and it worked sometimes.

Sometimes you got lucky by restarting the client, forcing it to use the same transaction id and then sending and abort transaction message.

There was one occassion where we had to delete the topic…

These days thankfully it is much simpler.

Native Kafka Tool

KIP-890 introduced a number of changes that help to stop hung transactions in the first place and KIP-664 resulted in native tools to help manage if this should happen again.

Kafka now comes bundled with a kafka-transaction.sh tool that greatly helps in managing transactions.

The most interesting options are find-hanging and abort

“find-hanging” essentially looks through all the current non expired transactions and checks to see if it can find a valid transaction coordinator for the TransactionId and ProducerId

If it can’t find one then it considers the transaction to be hung.

kafka-transactions.sh

“abort” takes the information listed in find-hanging and manually aborts a specific transaction (There is no option to commit)

The command to abort the above transaction would be

./kafka-transactions.sh — bootstrap-server 127.0.0.1:9092 abort — topic Testtopic — partition 1 — start-offset 4411

Remember that the transaction is force aborted so you may need to pull the message from that topic/partition/offset so that it can be replayed.

If you would like to know more about how to implement modern data, streaming and cloud technologies, such as Apache Kafka, into your business, we at Digitalis do it all: from data engineering, and cloud migration to fully managed services, we can help you modernize your applications, operations, and data. We provide consulting and managed services on cloud, data, and DevOps for any business type. Contact us for more information.

Subscribe to newsletter

Subscribe to receive the latest blog posts to your inbox every week.

By subscribing you agree to with our Privacy Policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Ready to Transform 

Your Business?