Anyone who is involved in maintaining a production Kafka cluster will worry about the messages that are within the system. There are a number of retention policies within the core Kafka framework which work perfectly well. It is, however, worthwhile kicking the tyres to confirm assumptions.
In this post I will create a Kafka topic and, using the command line tools to alter the retention policy and then confirm that messages are being retained as we would expect them too.
Kafka Topic Retention
Message retention is based on time, the size of the message or on both measurements. In most cases, that I’m aware of, using retention based on time is preferred by most companies.
For time based retention, the message log retention is based on either hours, minutes or milliseconds. In terms of priority to be actioned by the cluster milliseconds will win, always. You can set all three but the lowest unit size will be used.
takes priority over which takes priority over .
Where possible I advise you to use only one retention setting type, I use the retention.ms setting as a rule, this gives me the complete control I need.
A Prototype Example
In order to test my assumptions here’s a to-do list of the tasks I’m going to complete.
- Create a topic with a retention time of 3 minutes.
- Send a message to the topic with an obvious time in the payload.
- Alter the topic configuration and add another 30 minutes of retention time.
- Consume the message after the original three minute period and see if it’s still there.
Create a Topic
I’m using a standalone Kafka instance so there’s only one partition and one replica. The interesting part in this exercise is the configuration at the end. Using the setting I’m setting the topic retention time to three minutes (3 minutes x 60 seconds x 1000 milliseconds = 180000 milliseconds).
Send a Test Message
I’m using the command to send a plain text message to the topic. The topic doesn’t have a schema so I can send any type of message I wish, in this example I’m sending JSON as a string.
The message is now in the topic log and will be deleted just after 16:18. But I’m now going to extend the retention period to preserve that message a little longer.
Alter the Topic Retention
With the command you can inspect any of the topic configs, along with that you can alter them too. So I’m going to alter the and set it to 30 minutes (30 minutes * 60 seconds * 1000 milliseconds = 1,800,000 milliseconds).
Check the Topic by Consuming Messages
Leaving the cluster alone for 10-15 minutes will give enough time for the original message being committed to the message log after the configuration change. Running the consumer from the earliest offset will bring back the original messages. If the configuration change has worked as expected then the original message sent at 16:15 will be output.
Okay that’s worked perfectly well as expected, now I’ll try it again because I want to confirm it again. I’ll add the date this time for added confirmation.
Looking good. I’m going to do it again because I want to make sure. While I know it’s not an accident I like to check again.
In this brief post I’ve proved that messages written prior to the retention change have been preserved. To confirm the assumption creating a topic, sending a message and then altering the retention time gave us the evidence we require.