If you want to understand how to easily ingest data from Kafka topics into Cassandra than this blog can show you how with the DataStax Kafka Connector.
The post Incremental backups with rsync and hard links appeared first on digitalis.io.
]]>In this post I am going to describe a way to build a simple incremental backup solution using rsync and hard links. You may already be familiar with rsync but for anyone who is not, rsync is a command-line tool commonly used on Linux and other UNIX-like operating systems to copy and synchronise directories. I will assume some prior knowledge of rsync in this post so if you have not used it before there may be some parts that confuse you!
[user1@backupbox dir1]$ ls -li
total 128
33839002 -rw-rw-r--. 1 user1 user1 12942 Oct 2 16:14 file1
33839003 -rw-rw-r--. 1 user1 user1 14106 Oct 2 16:14 file2
33839004 -rw-rw-r--. 1 user1 user1 19360 Oct 2 16:14 file3
33839005 -rw-rw-r--. 1 user1 user1 17093 Oct 2 16:14 file4
33839006 -rw-rw-r--. 1 user1 user1 16094 Oct 2 16:14 file5
[user1@backupbox dir1]$ ls -li
total 64
33839002 -rw-r--r--. 2 user1 user1 12942 Oct 2 16:14 file1
33839003 -rw-r--r--. 2 user1 user1 14106 Oct 2 16:14 file2
33839002 -rw-r--r--. 2 user1 user1 12942 Oct 2 16:14 hardlink1
33839003 -rw-r--r--. 2 user1 user1 14106 Oct 2 16:14 hardlink2
33695760 lrwxrwxrwx. 1 user1 user1 5 Oct 2 16:15 symlink1 -> file1
33695762 lrwxrwxrwx. 1 user1 user1 5 Oct 2 16:15 symlink2 -> file2
[user1@backupbox dir1]$ ls -li
total 8
33839002 -rw-r--r--. 2 user1 user1 47 Oct 2 16:19 file1
33839002 -rw-r--r--. 2 user1 user1 47 Oct 2 16:19 hardlink1
[user1@backupbox dir1]$ cat file1
This is file1
[user1@backupbox dir1]$ cat hardlink1
This is file1
[user1@backupbox dir1]$ echo "an extra line" >>file1
[user1@backupbox dir1]$ cat file1
This is file1
an extra line
[user1@backupbox dir1]$ cat hardlink1
This is file1
an extra line
[user1@backupbox dir1]$ echo "another extra line" >>hardlink1
[user1@backupbox dir1]$ cat file1
This is file1
an extra line
another extra line
[user1@backupbox dir1]$ cat hardlink1
This is file1
an extra line
another extra line
[user1@backupbox dir1]$ ls -li
total 8
33839002 -rw-r--r--. 2 user1 user1 47 Oct 2 16:19 file1
33839002 -rw-r--r--. 2 user1 user1 47 Oct 2 16:19 hardlink1
[user1@backupbox dir1]$ sudo chown root.root file1
[user1@backupbox dir1]$ ls -li
total 8
33839002 -rw-r--r--. 2 root root 47 Oct 2 16:19 file1
33839002 -rw-r--r--. 2 root root 47 Oct 2 16:19 hardlink1
[user1@backupbox dir1]$ sudo chmod 0666 hardlink1
[user1@backupbox dir1]$ ls -li
total 8
33839002 -rw-rw-rw-. 2 root root 47 Oct 2 16:19 file1
33839002 -rw-rw-rw-. 2 root root 47 Oct 2 16:19 hardlink1
[user1@backupbox dir1]$ ls -li
total 8
33839002 -rw-r--r--. 2 user1 user1 47 Oct 2 16:19 file1
33839002 -rw-r--r--. 2 user1 user1 47 Oct 2 16:19 hardlink1
33695760 lrwxrwxrwx. 1 user1 user1 5 Oct 2 16:15 symlink1 -> file1
[user1@backupbox dir1]$ rm -f file1
[user1@backupbox dir1]$ ls -li
total 4
33839002 -rw-r--r--. 1 user1 user1 47 Oct 2 16:19 hardlink1
33695760 lrwxrwxrwx. 1 user1 user1 5 Oct 2 16:15 symlink1 -> file1
[user1@backupbox dir1]$ cat hardlink1
This is file1
an extra line
another extra line
[user1@backupbox dir1]$ cat symlink1
cat: symlink1: No such file or directory
[user1@backupbox dir1]$ ls -li
total 4
33839002 -rw-r--r--. 1 user1 user1 47 Oct 2 16:19 hardlink1
[user1@backupbox dir1]$ ln hardlink1 newlink1
[user1@backupbox dir1]$ ls -li
total 8
33839002 -rw-r--r--. 2 user1 user1 47 Oct 2 16:19 hardlink1
33839002 -rw-r--r--. 2 user1 user1 47 Oct 2 16:19 newlink1
[user1@backupbox dir1]$ rm hardlink1
[user1@backupbox dir1]$ ls -li
total 4
33839002 -rw-r--r--. 1 user1 user1 47 Oct 2 16:19 newlink1
[user1@backupbox dir1]$ cat newlink1
This is file1
an extra line
another extra line
rsync -av --delete server1:/home/data/ /backup/server1/
We would then run the same command again each time we wanted to update the mirror with the latest changes from the server.
To implement a basic incremental backup system we might consider making a local copy of the previous backup before starting the rsync:
[user1@backupbox dir1]$ cp -a /backup/server1/ /backup/server1Old/
Then we update our mirror from the remote server:
[user1@backupbox dir1]$ rsync -av --delete server1:/home/data/ /backup/server1/
Obviously this isn’t very efficient in either time or space so we could improve this by using hard links instead, which can be done by adding the -l argument to the cp command:
# Create a hard-linked clone of the current backup
cp -al /backup/server1 /backup/server1Old
# update our mirror from the remote server
rsync -av --delete server1:/home/data/ /backup/server1/
To improve things further we can use a feature in rsync which enables us to efficiently create hard-linked copies of a directory’s contents with only the changed files taking up space on disk. The rsync feature we need is the –link-dest argument.
server1:/home/data: Remote source directory
/backup/server1New: Destination for a new backup. Does not yet exist
/backup/server1Old: Existing previous backup
This is exactly what the –link-dest argument does for us. It performs a normal rsync from server1:/home/data to /backup/server1New but if the file does not exist in /backup/server1New it will look at the same relative path under /backup/server1Old to see if the file has changed. If the file in /backup/server1Old is the same as the file on the remote server then instead of copying it over rsync will create a hard link from the file in /backup/server1Old into /backup/server1New.
To use this we just add the “old” directory as the –link-dest argument to our rsync command:
rsync -av --link-dest /backup/server1Old server1:/home/data/ /backup/server1New/
Here we can see the old backup directory’s contents:
[user1@backupbox ~]$ ls -lRi /backup/server1Old/
/backup/server1Old/:
total 0
68876 drwxrwxr-x. 3 user1 user1 53 Oct 2 17:30 files
/backup/server1Old/files:
total 72
33651935 drwxrwxr-x. 2 user1 user1 42 Oct 2 17:30 bar
68882 -rw-rw-r--. 1 user1 user1 28883 Oct 2 17:30 foo1
68883 -rw-rw-r--. 1 user1 user1 27763 Oct 2 17:30 foo2
68884 -rw-rw-r--. 1 user1 user1 10487 Oct 2 17:30 foo3
/backup/server1Old/files/bar:
total 76
33695759 -rw-rw-r--. 1 user1 user1 32603 Oct 2 17:30 bar1
33838984 -rw-rw-r--. 1 user1 user1 15318 Oct 2 17:30 bar2
33839003 -rw-rw-r--. 1 user1 user1 26122 Oct 2 17:30 bar3
On the server we then modify a file:
[user1@server1 files]$ echo "Hello world" >/home/data/files/foo3
Now we run our incremental backup command:
[user1@backupbox ~]$ rsync -av --link-dest=/backup/server1Old server1:/home/data/ /backup/server1New/
receiving incremental file list
created directory /backup/server1New
files/foo3
sent 136 bytes received 272 bytes 816.00 bytes/sec
total size is 130,701 speedup is 320.35
We can see from the rsync output that only the changed file has been copied but if we list the contents of the new directory we can see it contains all of the files:
[user1@backupbox ~]$ ls -lRi /backup/server1New/
/backup/server1New/:
total 0
101051460 drwxrwxr-x. 3 user1 user1 53 Oct 2 17:30 files
/backup/server1New/files:
total 64
68885 drwxrwxr-x. 2 user1 user1 42 Oct 2 17:30 bar
68882 -rw-rw-r--. 2 user1 user1 28883 Oct 2 17:30 foo1
68883 -rw-rw-r--. 2 user1 user1 27763 Oct 2 17:30 foo2
101051461 -rw-rw-r--. 1 user1 user1 12 Oct 2 17:40 foo3
/backup/server1New/files/bar:
total 76
33695759 -rw-rw-r--. 2 user1 user1 32603 Oct 2 17:30 bar1
33838984 -rw-rw-r--. 2 user1 user1 15318 Oct 2 17:30 bar2
33839003 -rw-rw-r--. 2 user1 user1 26122 Oct 2 17:30 bar3
Using du we can also see that the second backup takes up less space on disk:
[user1@backupbox ~]$ du -chs /backup/server1*
140K /backup/server1New
12K /backup/server1Old
152K total
Here is an example script that can be used to create daily incremental backups of a directory. Each backup is stored in a directory named after today’s date and it will look for yesterday’s backup to create the hard links:
#!/bin/bash
# The source path to backup. Can be local or remote.
SOURCE=servername:/source/dir/
# Where to store the incremental backups
DESTBASE=/backup/servername_data
# Where to store today's backup
DEST="$DESTBASE/$(date +%Y-%m-%d)"
# Where to find yesterday's backup
YESTERDAY="$DESTBASE/$(date -d yesterday +%Y-%m-%d)/"
# Use yesterday's backup as the incremental base if it exists
if [ -d "$YESTERDAY" ]
then
OPTS="--link-dest $YESTERDAY"
fi
# Run the rsync
rsync -av $OPTS "$SOURCE" "$DEST"
Technical Lead
If you want to understand how to easily ingest data from Kafka topics into Cassandra than this blog can show you how with the DataStax Kafka Connector.
Do you want to know securely deploy k3s kubernetes for production? Have a read of this blog and accompanying Ansible project for you to run.
Do you want to know securely deploy k3s kubernetes for production? Have a read of this blog and accompanying Ansible project for you to run.
The post Incremental backups with rsync and hard links appeared first on digitalis.io.
]]>We recently deployed a Kafka Connect environment to consume Avro messages from a topic and write them into an Oracle database. Everything seemed to be functioning just fine until we got a message from the team saying their connectors had suddenly stopped working.
On further investigation we found errors like this in the Kafka Connect logs:
2020-01-17 12:56:48 ERROR Uncaught exception in thread 'kafka-producer-network-thread | producer-25':
java.lang.OutOfMemoryError: Java heap space
2020-01-17 13:02:54 ERROR Uncaught exception in thread 'kafka-producer-network-thread | producer-57':
java.lang.OutOfMemoryError: Direct buffer memory
Our first thought was that Kafka Connect just needed more heap space so we increased it from the defaults (256MB-1GB) up to a fixed 8GB heap but the errors kept coming. We increased it further up to 20GB and the errors were still happening. The machine was receiving one message every few seconds but the Kafka Connect process was using around 97% of the RAM and over 80% CPU. This machine has 8 CPUs and 32GB RAM so clearly something wasn’t right!
In this case we were using a custom Kafka Connect plugin to convert messages from the topic into the required format to be inserted into Oracle so the first thought was, do we have a memory leak in our code? We went over and over our plugin code and could not see anywhere that could possibly be leaking memory so we looked back at Kafka Connect itself.
What we could see was that our sink connectors would run until an invalid message was pushed into the topic, at which point the OutOfMemoryError exceptions started appearing in the logs. This made sense as the errors were only ever logged from producer threads and this Kafka Connect instance was only running Sink connectors, so it must be related to pushing the invalid messages to dead letter queues.
A typical connector configuration for this use case looks something like this:
{
"name": "oracle-sink-test",
"config": {
"connector.class": "com.mydomain.PluginClass",
"connector.type": "sink",
"tasks.max": "1",
"topics": "source_topic",
"topic.type": "avro",
"connection.user": "DBUserName",
"connection.password": "DBPassword",
"connection.url": "jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=oracle1)(PORT=9020))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=SINKTEST)))",
"db.driver": "oracle.jdbc.driver.OracleDriver",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"value.converter.schema.registry.url": "https://schemaregistry:8443",
"errors.tolerance": "all",
"errors.deadletterqueue.topic.name":"dlq_sink_test",
"errors.deadletterqueue.topic.replication.factor": 1,
"errors.deadletterqueue.context.headers.enable": true
}
}
As you can see in the example configuration our connectors were configured with dead letter queues and so we tried changing the connectors to and removing the dead letter queue config. As we had hoped, this change meant the connectors would now fail with an error when they encountered an invalid message but we also observed much lower CPU and RAM usage while the connectors were running. The next thing we tried was leaving set to and putting the dead letter queue config back into the connector. This resulted in the CPU and RAM use going back up again and instead of failing on an invalid message the connector job would hang and its consumer would eventually get timed out by the broker.
For a detailed explanation of error handling in Kafka Connect see this blog post which explains it far better than I ever could: https://www.confluent.io/blog/kafka-connect-deep-dive-error-handling-dead-letter-queues/.
So what was going on?
For a hint, here is an example of what the end of our looked like:
bootstrap.servers=broker1:9095,broker2:9095
security.protocol=SSL
ssl.truststore.location=/path/to/truststore.jks
ssl.truststore.password=truststorepassword
ssl.keystore.location=/path/to/keystore.jks
ssl.keystore.password=keystorepassword
ssl.key.password=keypassword
consumer.bootstrap.servers=broker1:9095,broker2:9095
consumer.security.protocol=SSL
consumer.ssl.truststore.location=/path/to/truststore.jks
consumer.ssl.truststore.password=truststorepassword
consumer.ssl.keystore.location=/path/to/keystore.jks
consumer.ssl.keystore.password=keystorepassword
consumer.ssl.key.password=keypassword
As you can see we have SSL enabled on the brokers. We didn’t give this much thought because there are SSL settings in there, we had no SSL-related errors in the logs and nothing else suggested that the issue was related to SSL. After much investigation and Googling for answers we found this open issue in the Kafka bug tracker: “JVM runs into OOM if (Java) client uses a SSL port without setting the security protocol” (https://issues.apache.org/jira/browse/KAFKA-4090). This was the hint we needed to fix our problem.
We had configured SSL settings for Kafka Connect’s internal connections and for the consumers but we had not configured SSL for the producer threads. This was possibly an oversight as we were only running Sink connectors on this environment, but of course there are producer threads running to push invalid messages to the dead letter queues. Based on the information in KAFKA-4090 we decided to add explicit SSL settings for the producer threads like this:
producer.bootstrap.servers=broker1:9095,broker2:9095
producer.security.protocol=SSL
producer.ssl.truststore.location=/path/to/truststore.jks
producer.ssl.truststore.password=truststorepassword
producer.ssl.keystore.location=/path/to/keystore.jks
producer.ssl.keystore.password=keystorepassword
producer.ssl.key.password=keypassword
After making this change we restarted Kafka Connect and suddenly the CPU use went from 80-90% down to 5% and the RAM use went down from 95% to just the Java heap size plus a little (around 300MB in total as the heap was now set to 256MB-1GB).
We reverted the connector configuration back to with the dead letter queue configured and hey presto! messages started being consumed from the topic and the invalid messages were being correctly pushed to the dead letter queue.
So in summary, if you are seeing unexpected out of memory exceptions in Kafka Connect and you are using SSL to communicate with the brokers, make sure you configure the SSL settings individually for all three types of connection – internal connections, consumers and producers.
Do you want to know securely deploy k3s kubernetes for production? Have a read of this blog and accompanying Ansible project for you to run.
Do you want to know securely deploy k3s kubernetes for production? Have a read of this blog and accompanying Ansible project for you to run.
Do you want to know securely deploy k3s kubernetes for production? Have a read of this blog and accompanying Ansible project for you to run.
The post Kafka Connect gotcha – SSL appeared first on digitalis.io.
]]>