Prometheus Blackbox-Exporter – monitoring TLS certificates

By 2020-08-17Blog

In any environment that needs to expose endpoints, it’s important to ensure that they are secured and monitored. Although there is plenty of documentation and examples on performance monitoring, there is minimal documentation on how to monitor the security of the endpoints and specifically the TLS/SSL non-HTTPS certificates. This blog outlines how you can use the Prometheus Blackbox Exporter to do this – I hope you find it useful!

As an example, when we deploy Apache Cassandra it is typical to secure the database endpoints with certificates to ensure the data is encrypted in-flight and, if required, enforce client certificate validation. Knowing when these certificates expire is important and being able to monitor and alert based on this is critical. However, this example also applies to any application that exposes a TLS/SSL endpoint like LDAP, Kafka, ELK, etc.

For the sake of simplicity, we will use a single Blackbox probe located on the same VM as our single Prometheus instance to monitor a certificate on an Apache Cassandra database.

Note:
In a high availability or production environment, it is always suggested to use multiple probes and multiple Prometheus instances. For Apache Cassandra in particular, we also suggest to use a complete monitoring system / operational toolset such as https://www.axonops.com/ .

First of all, let’s manually check the connection from the Prometheus instance to our node, cas01.dev.db.myexample.io port 9142 for Apache Cassandra. We can do that using the OpenSSL command:

makes return the prompt immediately after the command, openssl s_client connects to the endpoint to read the certificate and displays the certificates.

Now let’s configure the TCP module on the Blackbox Exporter probe like this: 

Perfect, we know that the probe works, can connect to our Apache Cassandra database and we can also see the various metrics exported.  

The metrics that are very useful are::

  • allows us to make sure that the endpoint is reachable;
  • is the expiry time of the certificate;
  • the time that the probe took, useful to check the responsiveness of the node.

Finally, let’s set Prometheus to scrape our node:

To visualize the metrics we can use the dashboards from the Grafana website, like: https://grafana.com/grafana/dashboards/7587 or https://grafana.com/grafana/dashboards/11529

Monitoring TLS/SSL certificates, also if alone is not sufficient for high availability or production environment, should be part of any monitoring system. No one likes to be woken up in the middle of the night because the entire production environment is down due to an expired certificate. The suggestion is always to make sure that your monitoring system displays and alerts for certificate expirations.