Containerized Cassandra Cluster for local testing

By 2020-09-29Blog

In this blogpost we will build clean and simple containerised Apache Cassandra cluster for local testing. A modern alternative to ccm (Cassandra Cluster Manager), taking advantage of Docker containers, while keeping the full control of Cassandra configuration.

This approach is based on the official image for cassandra (by Docker Official Images). Being able to rely on official image is important, because it is trusted, maintained and is scanned for security vulnerabilities.

We will be managing Cassandra configuration directly by attaching it via volumes. This will allow us to change any configuration we need quickly, without a need to re-build the image every time.

docker-compose will be used to orchestrate the Cassandra containers, network and volumes.

Kubernetes could be used for local orchestration as well, eg. minikube. Though the downside would be much more complex and verbose configuration, plus the control plane has a performance overhead, which can become critical for local testing. Basically it’s just an overkill for this purpose.

Step 0:  Meet the requirements

  • Make sure Docker is installed
  • So is the docker-compose  
  • The host machine needs to have  6Gb of RAM free

Step 1: Get the config files for the Cassandra version you need

Pick a specific Cassandra version from the image tags. When it comes to databases it always better to use a specific version, rather than just picking the latest.

Pull the image first:

docker image pull cassandra:3.11.8

As of today there is no way to directly copy files from a docker image. So, we are going to start a disposable tmp container for a brief moment of time to do the config copy. 

docker run --rm -d --name tmp cassandra:3.11.8
docker cp tmp:/etc/cassandra etc_cassandra-3.11.8_vanilla
docker stop tmp

Now we have Cassandra config templates under that we will use a bit later.

Step 2: docker-compose.yml file

Create the file named docker-compose.yml with the content below.

version: '2.4' # 2.4 is the last version that supports depends_on conditions for service health
networks:
  cassandra:  # docker network where all cassandra nodes will be put in
services:
  cass1:
    image: cassandra:3.11.8   # better to use a specific version, if you want to control upgrades
    container_name: cass1
    hostname: cass1
    mem_limit: 2g  # It's not strictly required, but it's better to have some memory limit
    healthcheck:
        test: ["CMD", "cqlsh", "-e", "describe keyspaces" ]
        interval: 5s
        timeout: 5s
        retries: 60
    networks:
      - cassandra
    ports:
      - "9042:9042"  # Expose native binary CQL port for your apps
    volumes:
      - ./data/cass1:/var/lib/cassandra    # This is the volume that will persist data for cass1 node
      - ./etc/cass1:/etc/cassandra   # Use your own config files for full control
    environment: &environment    # Declare and save environments variables into "environment"
        CASSANDRA_SEEDS: "cass1,cass2"    # The first two nodes will be seeds
        CASSANDRA_CLUSTER_NAME: SolarSystem
        CASSANDRA_DC: Mars
        CASSANDRA_RACK: West
        CASSANDRA_ENDPOINT_SNITCH: GossipingPropertyFileSnitch
        CASSANDRA_NUM_TOKENS: 128

  cass2:
    image: cassandra:3.11.8
    container_name: cass2
    hostname: cass2
    mem_limit: 2g
    healthcheck:
        test: ["CMD", "cqlsh", "-e", "describe keyspaces" ]
        interval: 5s
        timeout: 5s
        retries: 60
    networks:
      - cassandra
    ports:
      - "9043:9042"  # Expose native binary CQL port for your apps
    volumes:
      - ./data/cass2:/var/lib/cassandra    # This is the volume that will persist data for cass2 node
      - ./etc/cass2:/etc/cassandra    # Use your own config files for full control
    environment: *environment    # point to "environment" to use the same environment variables as cass1
    depends_on:
      cass1:    # start cass2 only after cass1 is healthy
        condition: service_healthy

  cass3:
    image: cassandra:3.11.8
    container_name: cass3
    hostname: cass3
    mem_limit: 2g
    healthcheck:
        test: ["CMD", "cqlsh", "-e", "describe keyspaces" ]
        interval: 5s
        timeout: 5s
        retries: 60
    networks:
      - cassandra
    ports:
      - "9044:9042"  # Expose native binary CQL port for your apps
    volumes:
      - ./data/cass3:/var/lib/cassandra    # This is the volume that will persist data for cass3 node
      - ./etc/cass3:/etc/cassandra    # Use your own config files for full control
    environment: *environment    # point to "environment" to use the same environment variables as cass1
    depends_on:
      cass2:    # start cass3 only after cass1 is healthy
        condition: service_healthystart cass3 only after cass2

This is going to create a Cassandra cluster consisting of 3 nodes, that will start in specific order: cass1, cass2, cass3.

Each node has two volumes setup: one for data and one for config files

cass1 and cass2 are nominated as seed nodes.

Note, that even though we can get hold of config files, you still need to set CASSANDRA_* environment variables.

Ports are exposed to the host as well, for you to be able to connect with your app on localhost. Alternatively you can put your app into a container in the same docker network.

Step 3: copy config files for each node in the cluster

mkdir -p etc
cp -a etc_cassandra-3.11.8_vanilla etc/cass1
cp -a etc_cassandra-3.11.8_vanilla etc/cass2
cp -a etc_cassandra-3.11.8_vanilla etc/cass3

Step 4: start and test the cluster

docker-compose up -d

Check that cassandra containers are starting

docker ps

Monitor cluster status

docker exec cass1  nodetool status

Check CQL is working

docker exec -it cass1   cqlsh  -e "describe keyspaces"

Congratulations! You have a working cluster!

Step 5: Do any configuration you need

The purpose of this approach is to be able to change ANY configuration you need. So, let’s change some!

We will enable Cassandra user authentication

You will need to edit cassandra.yaml for *every* node. 

These files:

./etc/cass1/cassandra.yaml
./etc/cass3/cassandra.yaml
./etc/cass2/cassandra.yaml
authenticator: PasswordAuthenticator
authorizer: CassandraAuthorizer

And restart the cluster:

docker-compose restart

Try connecting with cqlsh again

docker exec -it cass1  cqlsh  -e "describe keyspaces"

Connection error: (‘Unable to connect to any servers’, {‘127.0.0.1’: AuthenticationFailed(‘Remote end requires authentication.’,)})

Oops! It isn’t working any more! 

Which is a good thing, because we haven’t provided the default cassandra credentials. So, let’s do that:

docker exec -it cass1  cqlsh -u cassandra -p cassandra -e "describe keyspaces"

This was just an example, and you can do any changes you need, to any Cassandra configuration files under etc/<node>/

You can use your own configuration management to maintain those configs eg. Ansible. Don’t forget to save those(and docker-compose.yml as well) to version control.

Conclusion

We are now able to bootstrap a beautiful, reproducible Cassandra cluster based on the official docker image, while preserving the data and having the ability to change any configuration we need.

The automated and ready-to-use version of the approach is available in github

Such cluster can also be used to test more complex things, a few examples

  • TLS setup – you can just put keystores and trustores jks files on the config volume and refer to them in the config
  • Rolling cluster upgrade – just change the image version tag for each container one-by-one and run docker-compose up -d again. You will also need to make sure you copied the configs for the new version
  • Replication factor and data distribution

You can also take advantage of many goodies Docker can provide. Like resource limiting, healtchecks, or pausing the entire cluster when it’s not used, eg.

docker-compose pause    

# to resume it back
docker-compose unpause   

And last but not least, this approach can be used not just for Cassandra, but basically anything.