Using Hashicorp Consul with PostgreSQL for High Availability with no load balancers needed!
l

Sasha Aliashkevich

4 May, 2021

LinkedInTwitter

You can find an accompanying project on Github with a fully working example for this blog located here: https://github.com/digitalis-io/postgresql-consul-demo

Intro to the problem

In previous articles we described how to configure a highly available Postgresql cluster using Patroni, Etcd and HAProxy (part one, part two). The HAProxy component in such a configuration serves as a traffic router to the leader database node based on the information stored in Etcd.

As was mentioned it is strongly recommended to have an extra load balancer in front of  HAProxy so it could exclude the corresponding node from routing in case one of the nodes dies completely. Indeed, what happens if one of the HAproxy instances becomes unavailable? All relevant client connections get terminated and an application needs either to know where to find other instances of HAProxy or rely on an extra load balancer.

But what will happen if the load balancer dies? Right, now we need to design HA for load balancers. Even though this is a solvable problem let’s take a step back and see if we can eliminate load balancers from the picture entirely and still have a highly available Postgresql cluster.

Service Discovery to the rescue

With the rise of microservices the Service Discovery (SD) approach has become an extremely popular design element in distributed applications. In a nutshell the SD consists of a Service Registry (a centralized configuration storage) and services storing their configurations in that registry. So when a service A needs to find the instances of a service B it can request the Service Registry for the information. The more detailed description of this approach is far beyond the scope of the article but the main idea should be pretty clear.

After all Etcd, Consul and Zookeeper are in fact distributed service registry backends so when building a Patroni-based HA for Postgresql we already have one of those in place. Let’s try to leverage them and come up with the architecture suitable for running production databases without load balancers.

Consul DCS

For our purposes we will take Consul DCS as a Service Registry. There are two main differences with Consul from other distributed control systems which we will use for our goals. First is that Consul can run in client and server modes. Server nodes participate in the DCS leader election and store all the required data. In client mode a node doesn’t participate in leader election and only requests the required data from the current Consul leader.

The second feature which makes Consul special is its built-in DNS interface. It allows applications to find services without any special integration with consul, all the application needs is to have a consul client running on the localhost and configured the name resolution.

Database configuration

The plan is clear so let’s take a look at how to bring it to life. 

For the sake of simplicity we will skip here the detailed description of installation and configuration steps. If you would like to check the details please take a look at the demo application on github https://github.com/digitalis-io/postgresql-consul-demo where everything in this blog is set up for you to use and experiment with. It allows you to spin up a fully configured Postgresql cluster using docker-compose. 

Here we will focus on just describing the essential configuration elements. As shown on the diagram we will run 3 consul server nodes (consul01, consul02, consul03). These are the nodes responsible for leader election and consistency of the storage. On every node with Postgresql we run Patroni and Consul in client mode. 

The consul client configuration file is as simple as:

{
    "node_name": "${HOSTNAME}",
    "data_dir": "/consul/data",
    "retry_join":[
        "consul01",
        "consul02",
        "consul03"
     ],
    "encrypt": "eiweiV9shee3geibongee0aleleech3k",
    "recursors": ["127.0.0.11"],
    "verify_incoming": true,
    "verify_outgoing": true,
    "verify_server_hostname": true,
    "ca_file": "/etc/consul/certs/consul-agent-ca.pem",
    "cert_file": "/etc/consul/certs/dc1-client-consul-0.pem",
    "key_file": "/etc/consul/certs/dc1-client-consul-0-key.pem",
    "addresses": {
        "dns": "127.0.0.1",
        "http": "127.0.0.1"
    }
}

It includes the names of the server nodes to send requests to in the retry_join parameter.  The addresses parameter points that http and dns requests are served on the loopback network interface only. The recursors parameter is important for consul clients running on application nodes. We will get back to it a little later. 

Here is a full Patroni config file:

name: "${HOSTNAME}"
scope: pglab
namespace: /digitalis.io/
consul:
  url: http://127.0.0.1:8500
  register_service: true
postgresql:
  connect_address: "${HOSTNAME}:5432"
  bin_dir: /usr/lib/postgresql/13/bin
  data_dir: /pg_data/13/main
  listen: "*:5432"
  authentication:
    replication:
      username: replicator
      password: Zahva4be
      sslmode: require
restapi:
  connect_address: "${HOSTNAME}:8008"
  listen: "${HOSTNAME}:8008"
bootstrap:
  dcs:
    postgresql:
      parameters:
        ssl: "on"
        ssl_ciphers: "TLSv1.2:!aNULL:!eNULL"
        ssl_cert_file: /etc/patroni/certs/server.crt
        ssl_key_file: /etc/patroni/certs/server.key
  users:
    app_user:
      password: "eW5guPae"
  pg_hba:
    - local all all  md5
    - hostssl all all 127.0.0.1/32 md5
    - hostssl all all ::1/128 md5
    - hostssl all all ::1/128 md5
    - hostssl all all 0.0.0.0/0 md5
    - hostssl replication replicator patroni01.postgresql-consul-demo_pglab md5
    - hostssl replication replicator patroni02.postgresql-consul-demo_pglab md5
    - hostssl replication replicator patroni03.postgresql-consul-demo_pglab md5
  initdb:
    - encoding: UTF8

Most of the parameters are related to Postgresql configuration and are pretty straightforward. The Consul related part is just this: 

consul:
  url: http://127.0.0.1:8500
  register_service: true

It basically tells Patroni to use Consul as a DCS via a locally-running consul client agent on its default http port 8500. The important part is to set register_service: true. By default Patroni doesn’t register any services in Consul but if you turn it on there will be a service created with the <scope> name and two tags: master and replica. In order to access the master node we can use the hostname master.pglab.service.consul, where pglab is our scope name.

Application configuration

So now when we have Postgresql managed by Patroni up and running let’s see what we need to do on the application side. As was mentioned before we want to avoid any Consul specifics in the application code so we need to make it transparent. For that we will use Consul agent in client mode and Dnsmasq in order to forward DNS lookups for .consul domain to our Consul agent.

So the Dnsmasq config is:

server=/consul/127.0.0.1#8600

And Consul agent config will be the same as for Patroni nodes:

{
    "node_name": "${HOSTNAME}",
    "data_dir": "/consul/data",
    "retry_join":[
        "consul01",
        "consul02",
        "consul03"
     ],
    "encrypt": "eiweiV9shee3geibongee0aleleech3k",
    "recursors": ["127.0.0.11"],
    "verify_incoming": true,
    "verify_outgoing": true,
    "verify_server_hostname": true,
    "ca_file": "/etc/consul/certs/consul-agent-ca.pem",
    "cert_file": "/etc/consul/certs/dc1-client-consul-0.pem",
    "key_file": "/etc/consul/certs/dc1-client-consul-0-key.pem",
    "addresses": {
        "dns": "127.0.0.1",
        "http": "127.0.0.1"
    }
}

As mentioned the important parameter in Consul config here is “recursors”: [“127.0.0.11”]. Since Patroni registers node names in Consul a DNS lookup returns a CNAME record: patroni01 for example. In order to resolve it to the IP address we need to pass it for further resolution to an external DNS server, in our example it’s an embedded docker DNS server at 127.0.0.11.

And a final step, a resolv.conf file:

nameserver 127.0.0.1
nameserver 127.0.0.11
options ndots:0

It says to try first to resolve names via local DNS server, i.e. dnsmasq which in its turn forwards only .consul domain to Consul agent on default port 8600. All the rest hostname resolution will be served by the docker DNS.

So now if we want to connect to the current primary Postgresql node we can just set the environment variable PGHOST=master.pglab.service.consul or just use the appropriate hostname in our connection string.

The only thing our application should care about is to re-establish connection in case of failure using the same connection string. If Patroni promotes a new replica to be a leader in the meantime then the new connection will be established to the new node.

Load balancing between replicas

One of the features HAProxy architecture provides is that you can connect to the replicas for read only queries, i.e. you can balance the read load between replicas and thus make less load to the primary node. Even though the Consul based architecture doesn’t have any load balancer in place it is still possible to achieve that.

All we need to do is to use replica.<scope>.service.consul hostname to establish connection. When consul resolves this name it returns replica names in random order. So different instances of an application will get connected to different replicas thus achieving the load balancing.

Conclusion

Building HA Postgresql cluster based on Consul service discovery and DNS allows us to get rid of extra moving parts such as load balancers which might be a single point of failure at the same time. A little footprint in the form of a Consul client running on an application side is an affordable price for that since it doesn’t require any reconfiguration in case of Postgresql failover. Make sure to check out the working demo on github (https://github.com/digitalis-io/postgresql-consul-demo) and give it a try yourself!

If you would like to know more about how to implement modern data and cloud technologies, such as PostgreSQL, into your business, we at Digitalis do it all: from cloud migration to fully managed services, we can help you modernize your operations, data, and applications. We provide consulting and managed services on clouddata, and DevOps for any business type. Contact us for more information.

Sasha Aliashkevich

Sasha Aliashkevich

Full Stack Platform Engineer

Sasha is a full stack platform engineer having experience in developing & operating scalable infrastructure projects using various technologies like PostgreSQL, Cassandra, AWS & GCP.

Categories

Archives

Related Articles