@ET-DC@eyJkeW5hbWljIjp0cnVlLCJjb250ZW50IjoicG9zdF90aXRsZSIsInNldHRpbmdzIjp7ImJlZm9yZSI6IiIsImFmdGVyIjoiIn19@[addtoany buttons="twitter", "linkedin"]
In my last Pulsar post I did a side by side comparison of Apache Kafka and Apache Pulsar. Let’s continue looking at Pulsar a little closer, there are some really interesting things when it comes to topics and the options available.
Starting a Standalone Pulsar Cluster
As with Kafka, Pulsar lets you operate a standalone cluster so you can get the grasp of the basics. For this blog I’m going to assume that you have installed the Pulsar binaries, while you can operate Pulsar in a Docker container or via Kubernetes, I will not be covering those in this post.
In the bin directory of the Pulsar distribution is the pulsar command. This gives you control on starting a standalone cluster or the Zookeeper, Bookkeeper and Pulsar broker components separately. I’m going to start a standalone cluster:
JCBiaW4vcHVsc2FyIHN0YW5kYWxvbmU=
After a few minutes you will see that the cluster is up and running.
MTE6MDI6MDYuOTAyIFt3b3JrZXItc2NoZWR1bGVyLTBdIElORk8gIG9yZy5hcGFjaGUucHVsc2FyLmZ1bmN0aW9ucy53b3JrZXIuU2NoZWR1bGVyTWFuYWdlciAtIFNjaGVkdWxlIHN1bW1hcnkgLSBleGVjdXRpb24gdGltZTogMC4wNDIyMjcyMjQgc2VjIHwgdG90YWwgdW5hc3NpZ25lZDogMCB8IHN0YXRzOiB7IkFkZGVkIjogMCwgIlVwZGF0ZWQiOiAwLCAicmVtb3ZlZCI6IDB9CnsKICAiYy1zdGFuZGFsb25lLWZ3LWxvY2FsaG9zdC04MDgwIiA6IHsKICAgICJvcmlnaW5hbE51bUFzc2lnbm1lbnRzIiA6IDAsCiAgICAiZmluYWxOdW1Bc3NpZ25tZW50cyIgOiAwLAogICAgImluc3RhbmNlc0FkZGVkIiA6IDAsCiAgICAiaW5zdGFuY2VzUmVtb3ZlZCIgOiAwLAogICAgImluc3RhbmNlc1VwZGF0ZWQiIDogMCwKICAgICJhbGl2ZSIgOiB0cnVlCiAgfQp9Cg==
Pulsar Consumers
In the comparison blog post I noted that where Kafka pulls messages from the brokers, Pulsar pushes messages out to consumers. Pulsar uses subscriptions to route messages from the brokers to any number of consumers that are subscribed. The read position of the log is handled by the brokers.
I’m going to create a basic consumer to the standalone cluster. In the bin directory there is a Pulsar client application that we can use without having to code anything, very similar to the Kafka console-producer and console-consumer applications.
JCBiaW4vcHVzbGFyLWNsaWVudCBjb25zdW1lIHNhbGVzLXRyaWdnZXIgLXMgInN0LXN1YnNjcmlwdGlvbi0xIg==
Let’s break this command down a little bit.
Once executed you will see in the consumer application output that it has subscribed to the topic and is awaiting for a message.
MTI6MDQ6NTAuNjIxIFtwdWxzYXItY2xpZW50LWlvLTEtMV0gSU5GTyAgb3JnLmFwYWNoZS5wdWxzYXIuY2xpZW50LmltcGwuQ29uc3VtZXJJbXBsIC0gW3NhbGVzLXRyaWdnZXJdW3N0LXN1YnNjcmlwdGlvbi0xXSBTdWJzY3JpYmluZyB0byB0b3BpYyBvbiBjbnggW2lkOiAweDA0OWE0NTY3LCBMOi8xMjcuMC4wLjE6NjM5MTIgLSBSOmxvY2FsaG9zdC8xMjcuMC4wLjE6NjY1MF0sIGNvbnN1bWVySWQgMAoxMjowNDo1MC42NjQgW3B1bHNhci1jbGllbnQtaW8tMS0xXSBJTkZPICBvcmcuYXBhY2hlLnB1bHNhci5jbGllbnQuaW1wbC5Db25zdW1lckltcGwgLSBbc2FsZXMtdHJpZ2dlcl1bc3Qtc3Vic2NyaXB0aW9uLTFdIFN1YnNjcmliZWQgdG8gdG9waWMgb24gbG9jYWxob3N0LzEyNy4wLjAuMTo2NjUwIC0tIGNvbnN1bWVyOiAwCg==
You may have noticed that the topic hasn’t been created yet, the consumer is up and running waiting though. Now let’s create the producer and send a message.
Pulsar Producers
Opening another terminal window, I’m going to run the Pulsar client as a producer this time and send a single message.
JCBiaW4vcHVsc2FyLWNsaWVudCBwcm9kdWNlIHNhbGVzLXRyaWdnZXIgLS1tZXNzYWdlcyAiVGhpcyBpcyBhIHRlc3QgbWVzc2FnZSI=
When executed the producer will connect to the cluster and send the message, the output shows that the message was sent.
MTM6NTA6NTYuMzQyIFttYWluXSBJTkZPICBvcmcuYXBhY2hlLnB1bHNhci5jbGllbnQuY2xpLlB1bHNhckNsaWVudFRvb2wgLSAxIG1lc3NhZ2VzIHN1Y2Nlc3NmdWxseSBwcm9kdWNlZCA=
If you are running your own consumer and producer, now take a look at the consumer and see what’s happened, it’s received the message from the broker and then cleanly exited.
LS0tLS0gZ290IG1lc3NhZ2UgLS0tLS0Ka2V5OltudWxsXSwgcHJvcGVydGllczpbXSwgY29udGVudDpUaGlzIGlzIGEgdGVzdCBtZXNzYWdlCjEzOjUwOjU2LjM3OCBbbWFpbl0gSU5GTyAgb3JnLmFwYWNoZS5wdWxzYXIuY2xpZW50LmltcGwuUHVsc2FyQ2xpZW50SW1wbCAtIENsaWVudCBjbG9zaW5nLiBVUkw6IHB1bHNhcjovL2xvY2FsaG9zdDo2NjUwLwoxMzo1MDo1Ni40MDQgW3B1bHNhci1jbGllbnQtaW8tMS0xXSBJTkZPICBvcmcuYXBhY2hlLnB1bHNhci5jbGllbnQuaW1wbC5Db25zdW1lckltcGwgLSBbc2FsZXMtdHJpZ2dlcl0gW3N0LXN1YnNjcmlwdGlvbi0xXSBDbG9zZWQgY29uc3VtZXIKMTM6NTA6NTYuNDA5IFtwdWxzYXItY2xpZW50LWlvLTEtMV0gSU5GTyAgb3JnLmFwYWNoZS5wdWxzYXIuY2xpZW50LmltcGwuQ2xpZW50Q254IC0gW2lkOiAweDA0OWE0NTY3LCBMOi8xMjcuMC4wLjE6NjM5MTIgISBSOmxvY2FsaG9zdC8xMjcuMC4wLjE6NjY1MF0gRGlzY29ubmVjdGVkCjEzOjUwOjU2LjQyMiBbbWFpbl0gSU5GTyAgb3JnLmFwYWNoZS5wdWxzYXIuY2xpZW50LmNsaS5QdWxzYXJDbGllbnRUb29sIC0gMSBtZXNzYWdlcyBzdWNjZXNzZnVsbHkgY29uc3VtZWQKJA==
If you are used to using Kafka you would expect your consumer client to wait for any more messages from the broker, however, with Pulsar this is not the default behaviour of the client application.
Ideally the client consumer should keep running, awaiting more messages from the brokers. There is an additional flag in the client that can be set.
JCBiaW4vcHVzbGFyLWNsaWVudCBjb25zdW1lIHNhbGVzLXRyaWdnZXIgLXMgInN0LXN1YnNjcmlwdGlvbi0xIiAtbiAw
The -n flag stands for the number of messages to accept before the consumer disconnects from the cluster and closes, the default is 1 message, if set to 0 then no limit is set and it will consume any messages the brokers push to it.
Like the consumer settings, the producer can send multiple messages in one execution
JCBiaW4vcHVsc2FyLWNsaWVudCBwcm9kdWNlIHNhbGVzLXRyaWdnZXIgLS1tZXNzYWdlcyAiVGhpcyBpcyBhIHRlc3QgbWVzc2FnZSIgLW4gMTAw
With the -n flag in the produce mode, the client will send one hundred messages to the broker.
MTU6MDE6MDMuMzM5IFttYWluXSBJTkZPICBvcmcuYXBhY2hlLnB1bHNhci5jbGllbnQuY2xpLlB1bHNhckNsaWVudFRvb2wgLSAxMDAgbWVzc2FnZXMgc3VjY2Vzc2Z1bGx5IHByb2R1Y2VkCgpUaGUgYWN0aXZlIGNvbnN1bWVyIHdpbGwgcmVjZWl2ZSB0aGUgbWVzc2FnZXMgYW5kIGF3YWl0IG1vcmUuIAoKLS0tLS0gZ290IG1lc3NhZ2UgLS0tLS0Ka2V5OltudWxsXSwgcHJvcGVydGllczpbXSwgY29udGVudDpUaGlzIGlzIGEgdGVzdCBtZXNzYWdlCi0tLS0tIGdvdCBtZXNzYWdlIC0tLS0tCmtleTpbbnVsbF0sIHByb3BlcnRpZXM6W10sIGNvbnRlbnQ6VGhpcyBpcyBhIHRlc3QgbWVzc2FnZQotLS0tLSBnb3QgbWVzc2FnZSAtLS0tLQprZXk6W251bGxdLCBwcm9wZXJ0aWVzOltdLCBjb250ZW50OlRoaXMgaXMgYSB0ZXN0IG1lc3NhZ2UKLS0tLS0gZ290IG1lc3NhZ2UgLS0tLS0Ka2V5OltudWxsXSwgcHJvcGVydGllczpbXSwgY29udGVudDpUaGlzIGlzIGEgdGVzdCBtZXNzYWdlCi0tLS0tIGdvdCBtZXNzYWdlIC0tLS0tCmtleTpbbnVsbF0sIHByb3BlcnRpZXM6W10sIGNvbnRlbnQ6VGhpcyBpcyBhIHRlc3QgbWVzc2FnZQo=
Keys and Properties
You may have noticed in the consumer output that along with the content of the message are two other sections, a key and properties.
Each message can have a key, optional but highly advised. Properties are based on key/value pairs, you have multiple properties by comma separating them. Supposing I want to have an action property with some form of command and the key being the current Unix timestamp, the client would look like the following:
JCBiaW4vcHVsc2FyLWNsaWVudCBwcm9kdWNlIHNhbGVzLXRyaWdnZXIgLS1tZXNzYWdlcyAiVGhpcyBpcyBhIHRlc3QgbWVzc2FnZSIgLW4gMTAwIC1wIGFjdGlvbj1jcmVhdGUgLWsgYGRhdGUgKyVzYA==
As the consumer is still running, awaiting new messages, you will see the output with the key and properties.
LS0tLS0gZ290IG1lc3NhZ2UgLS0tLS0Ka2V5OlsxNjExMzI4MTI1XSwgcHJvcGVydGllczpbYWN0aW9uPWNyZWF0ZV0sIGNvbnRlbnQ6VGhpcyBpcyBhIHRlc3QgbWVzc2FnZQ==
Persistent and Non-Persistent Messages
There are few differences between Kafka and Pulsar when it comes to persistence of messages. By default Pulsar will assume a topic is classed as persistent and will save messages to the Bookkeeper instances (called Bookies).
Whereas Kafka has a time to live for messages regardless of whether the consumer has read the message or not, the default is seven days (168 hours), Pulsar will keep the messages persisted. Once all subscribed consumers have successfully read the messages and acknowledged so back to the broker, the messages will then be removed from storage.
Pulsar can be configured, and should be in production environments, to have a time-to-live (TTL) for messages held in persistent storage.
In Memory Topics
If you wish for topic messages to be stored within memory and not to disk then non-persistent topics are available.
Creating non-persistent topics can be done for the client but require the full namespace configuration.
When interacting with clients we now have to use the full namespace. So running the consumer again, but with a non-persistent topic would like this:JCBiaW4vcHVsc2FyLWNsaWVudCBjb25zdW1lICBub24tcGVyc2lzdGVudDovL3B1YmxpYy9kZWZhdWx0L3NhbGVzLXRyaWdnZXIyIC1zICJzdC1zdWJzY3JpcHRpb24tMiIgLW4gMA==If it were to persistent (storage going to the Bookies):JCBiaW4vcHVsc2FyLWNsaWVudCBjb25zdW1lICBwZXJzaXN0ZW50Oi8vcHVibGljL2RlZmF1bHQvc2FsZXMtdHJpZ2dlcjIgLXMgInN0LXN1YnNjcmlwdGlvbi0yIiAtbiAwWith non-persistent topics if the broker fails then any messages stored in memory or in transit to a subscribed consumer are lost.
Listing Topics
The Pulsar admin client handles all aspects of the cluster from the command, this includes broker, bookies, topics and TTL configurations and specific configurations for named subscriptions if required.
For now, let’s just list the topics I’ve been working with in this post:
JCBiaW4vcHVsc2FyLWFkbWluIHRvcGljcyBsaXN0IHB1YmxpYy9kZWZhdWx0CiJub24tcGVyc2lzdGVudDovL3B1YmxpYy9kZWZhdWx0L3NhbGVzLXRyaWdnZXIyIgoicGVyc2lzdGVudDovL3B1YmxpYy9kZWZhdWx0L3NhbGVzLXRyaWdnZXIi
Summary
This post should give you a basic starting point of how the Pulsar client and the standalone cluster work. Consumers and producers give us the backbone of a streaming application, with the added features such as whether a topic is persistent or non-persistent (in memory).
All this has been done from the command line, in a future post I’ll look at putting a basic Producer and Consumer application together in code.
If you would like to know more about how to implement modern data, streaming and cloud technologies into your business, we at Digitalis do it all: from cloud migration to fully managed services, we can help you modernize your operations, data, streaming and applications. We provide consulting and managed services on cloud, data, and DevOps for any business type. Contact us for more information.
With over 30 years’ of experience in software, customer loyalty data and big data, Jason now focuses his energy on Kafka and Hadoop. He is also the author of Machine Learning: Hands on for Developers and Technical Professionals. Jason is considered a stalwart in the Kafka community. Jason is a regular speaker on Kafka technologies, AI and customer and client predictions with data.