This blog describes the new preview implementation of KIP-500, Zookeeper-less Apache Kafka, released in 3.0!
If you would like to know more about how to implement modern data and cloud technologies into your business, we at Digitalis do it all: from the cloud and Kubernetes migrations to fully managed services, we can help you modernize your operations, data, and applications – on-premises, in the cloud and hybrid.
We provide consulting and managed services on a wide variety of technologies. Contact us today for more information or to learn more about each of our services.
The exciting news first, Zookeeper-less Kafka is nearly upon us! The chat in the playground is about one thing really, the implementation of KIP-500, the Kraft protocol that will replace Zookeeper.
Before I get into some of the details please keep in mind that it’s a preview and not to be used in production.
If you need a reminder, please feel free to use this one as a desktop background or screensaver.
The Benefits of KIP-500
One of the reasons for this is down to Zookeeper, high partition counts have an impact on network performance between the brokers and Zookeeper. Start going over the safe limits and the cluster starts to creak at the sides, consumers will disconnect and the resulting rebalances will affect performance. Large partition counts also have a negative effect on cluster startup times, nodes can take longer to initialise and join with large partition counts.
The Kraft implementation removes this partitioning issue meaning we can scale any number of nodes and any number of partitions. And as there is an increased emphasis on idempotent writes and ordered messaging, the transaction API becomes critical to the core broker infrastructure, and this increases the partition burden on the cluster.
Removing the Elephant in the Room
Zookeeper gets a bad wrap for all the wrong reasons. The main criticism is that it’s something else to monitor, maintain and look after. Over the last few years, Kafka has been gradually moving components out of Zookeeper nodes and into internal topics.
Over the last 18 months, there’s been more talk about loosening the grip of Zookeeper from Kafka. In Kafka 2.8 the experimental release was introduced to the world but in a basic form, no ACL support within the Kraft protocol was a complete showstopper for some, myself included.
Let’s look at the current way Zookeeper works with Kafka and then introduce the replacement Kraft implementation.
The regular complaint is that you have to run and maintain five additional nodes in your cluster, monitoring Zookeeper is equally, if not more, important than the broker monitoring.
Kafka Brokers with Kraft Controllers Combined
Kraft replaces Zookeeper nodes with “controllers” which handle the metadata between the brokers. Whereas Zookeeper had a leader and followers, Kraft has an active controller and hot standby controllers.
In the same way as Zookeeper you are required to have a quorum of Kraft controllers in your cluster, so we’re back to the odd numbers of nodes again (3,5,…). There’s nothing wrong with having four broker nodes and using three for the controller duties.
How do you define that a broker is a controller? This is handled by a new broker setting called process.roles.
In the diagram below you can see the broker/controller nodes coloured in yellow.
|Broker||Process Role Settings|
Something I haven’t seen discussed is the amount of disk space a controller is going to use. Keep in mind that there is overhead for the operating system, the Kafka distribution files, the Kafka log data and then the controller data we have to plan accordingly with the overheads.
The downside with this configuration is that if there is a failure on a broker, especially one that is an Out of Memory (OOM) termination or the disk fills up and renders the broker unusable, the controller will be unavailable as well.
The cluster will still operate as the active controller will be assigned to another hot standby. If another controller is lost then the quorum is no longer maintained and the cluster will fail.
For a production cluster, personally I would be nervous about this approach, the mix of broker and controller on the same node, the potential meta data loss on the controller if something were to happen on the broker, it’s a risk.
Keeping the controllers away from the brokers is another scenario, though.
Kafka with Kraft Controllers
With the controllers on separate nodes we separate out the process configurations. The cluster settings would look as follows:
|Broker||Process Role Settings|
One interesting question I have, with the controllers requiring network connectivity to the brokers, will we encounter the same issue that we have with Zookeeper? That of the partition count correlating with network capacity? Benchmarking will hopefully answer that, sadly I’m not at that stage yet.
Removal or Replacement?
It is fairly evident that we’re not removing anything, merely replacing Zookeeper with another type of quorum controller. There are benefits for doing this where the partition limits are concerned, meaning that we can run bigger and higher throughput clusters.
The real question is what is the best practice on where to house the controllers. In my eyes, it would be in separate nodes, in the same way you use Zookeeper. For full control, the high availability of the cluster then keeping the controller in separate nodes as you would with a Zookeeper led cluster would be the best option.
Like I stated at the beginning, the current Kraft release is not production-ready, it’s a preview. So please don’t run any production data through the current 3.0 release. If you were to ask me when it would be “ready” for production, I would personally leave a good 18-24 months before committing to a rollout on a cluster, it’s a breaking change, once Zookeeper is removed there’s no rolling back.
If you want to understand how to easily ingest data from Kafka topics into Cassandra than this blog can show you how with the DataStax Kafka Connector.
This blog describes some of the main differences between Apache Kafka and Pulsar – two of the leading data streaming Apache projects.
How Kafka can help meet regulatory standards and compliance when used as an event broker to Security Information and Event Management (SIEM) systems.