HA Kubernetes Monitoring using Prometheus and Thanos

December 20, 2019

Table of Contents

1. Introduction

2. Why Integrate Prometheus with Thanos?

3. Thanos Overview

          3.1 Thanos architecture 

          3.2 Thanos Sidecar

          3.3 Thanos Store

          3.4 Thanos Query 

          3.5 Thanos Compact

          3.6 Thanos Ruler 

4. Thanos Configuration

5. Deployment 

6. Grafana Dashboards

7. Conclusion

1. Introduction

In this article, we will deploy a clustered Prometheus setup that integrates Thanos. It is resilient against node failures and ensures appropriate data archiving. The setup is also scalable. It can span multiple Kubernetes clusters under the same monitoring umbrella. Finally, we will visualize and monitor all our data in accessible and beautiful Grafana dashboards

2. Why Integrate Prometheus with Thanos?

Prometheus is scaled using a federated set-up, and its deployments use a persistent volume for the pod. However, not all data can be aggregated using federated mechanisms. Often, you need a different tool to manage Prometheus configurations. To address these issues, we will use Thanos. Thanos allows you to create multiple instances of Prometheus, deduplicate data, and archive data in long-term storage like GCS or S3. 

3. Thanos Overview

3.1 Thanos Architecture

The components of Thanos are sidecar, store, query, compact, and ruler. Let's take a look at what each one does.

3.2 Thanos Sidecar

  • The main component that runs along Prometheus
  • Reads and archives data on the object store
  • Manages Prometheus’s configuration and lifecycle
  • Injects external labels into the Prometheus configuration to distinguish each Prometheus instance
  • Can run queries on Prometheus servers’ PromQL interfaces 
  • Listens in on Thanos gRPC protocol and translates queries between gRPC and REST

3.3 Thanos Store

  • Implements the Store API on top of historical data in an object storage bucket 
  • Acts primarily as an API gateway and therefore does not need significant amounts of local disk space
  • Joins a Thanos cluster on startup and advertises the data it can access 
  • Keeps a small amount of information about all remote blocks on a local disk in sync with the bucket
  • This data is generally safe to delete across restarts at the cost of increased startup times

3.4 Thanos Query 

  • Listens in on HTTP and translates queries to Thanos gRPC format
  • Aggregates the query result from different sources, and can read data from Sidecar and Store
  • In HA setup, Thanos Query even deduplicates the result

A note on run-time duplication of HA groups: Prometheus is stateful and does not allow for replication of its database. Therefore, it is not easy to increase high availability by running multiple Prometheus replicas. 

Simple load balancing will also not work -- say your app crashes. The replica might be up, but querying it will result in a small time gap for the period during which it was down. This isn’t fixed by having a second replica because it could be down at any moment, for example, during a rolling restart. These instances show how load balancing can fail. 

Thanos Query pulls the data from both replicas and deduplicates those signals, filling the gaps, if any, to the Querier consumer.

3.5 Thanos Compact 

  • Applies the compaction procedure of the Prometheus 2.0 storage engine to block data in object storage
  • Generally not concurrent with safe semantics and must be deployed as a singleton against a bucket
  • Responsible for downsampling data: 5 minute downsampling after 40 hours and 1 hour downsampling after 10 days

3.6 Thanos Ruler

Thanos Ruler basically does the same thing as the querier but for Prometheus’ rules. The only difference is that it can communicate with Thanos components.

4. Thanos Implementation

Prerequisites: In order to completely understand this tutorial, the following are needed:

1. Working knowledge of Kubernetes and kubectl

2. A running Kubernetes cluster with at least 3 nodes (We will use a GKE)

3. Implementing Ingress Controller and Ingress objects (We will use Nginx Ingress Controller); although this is not mandatory, it is highly recommended in order to reduce external endpoints.

4. Creating credentials to be used by Thanos components to access object store (in this case, GCS bucket) 

          a. Create 2 GCS buckets and name them as prometheus-long-term and thanos-ruler

          b. Create a service account with the role as Storage Object Admin

          c. Download the key file as json credentials and name it thanos-gcs-credentials.json

          d. Create a Kubernetes secret using the credentials, as you can see in the following snippet:


5. Deployment 

Deploying Prometheus Services Accounts, Clusterrole and Clusterrolebinding: The following manifest creates the monitoring namespace, service accounts, clusterrole and clusterrolebindings needed by Prometheus. 


Deploying Prometheus Configuration configmap: The following config map creates the Prometheus configuration file template that will be read by the Thanos sidecar component. The template will also generate the actual configuration file. The file will be consumed by the Prometheus container running in the same pod. It is extremely important to add the external_labels section in the config file so that the querier can deduplicate data based on it.


Deploying Prometheus Rules configmap: this will create alert rules that will be relayed to Alertmanager for delivery.


Deploying Prometheus Stateful Set


It is important to understand the following about the above manifest:

  1. Prometheus is deployed as a stateful set with three replicas. Each replica provisions its own persistent volume dynamically. 
  2. Prometheus configuration is generated by the Thanos Sidecar container using the template file created above.
  3. Thanos handles data compaction and therefore we need to set --storage.tsdb.min-block-duration=2h and --storage.tsdb.max-block-duration=2h  
  4. Prometheus stateful set is labeled as thanos-store-api: "true" so that each pod gets discovered by the headless service (we will show you how to do that next). This headless service will be used by Thanos Query to query data across all the Prometheus instances. 
  5. We apply the same label to the Thanos Store and Thanos Ruler component so that they are also discovered by the querier and can be used for querying metrics. 
  6. The GCS bucket credentials path is provided using the GOOGLE_APPLICATION_CREDENTIALS environment variable. The configuration file is mounted to that from the secret created as a part of the prerequisites. 

Deploying Prometheus Services


We create different services for each Prometheus pod in the stateful set. These are not strictly necessary, but are created only for debugging purposes. The purpose of thanos-store-gateway headless service has been explained above. Next, we will expose the Prometheus services using an ingress object. 

Deploying Thanos Query: this is one of the main components of Thanos deployment. Note the following

  1. The container argument --store=dnssrv+thanos-store-gateway:10901 helps discover all the components from which metric data should be queried.
  2. The service thanos-querier provides a web interface to run PromQL queries. It also has the option to deduplicate data across various Prometheus clusters. 
  3. From here, we provide Grafana as a datasource for all the dashboards.


Deploying Thanos Store Gateway: this will create the store component which serves metrics from the object storage to the querier. 


Deploying Thanos Compact


Deploying Thanos Ruler


If you go to the interactive shell in the same namespace as our workloads to check which pods thanos-store-gateway resolves, you will see something like this: 


The IPs returned above correspond to our Prometheus pods, thanos-store and thanos-ruler. This can be verified as: 


Deploying Alertmanager: This will create our alertmanager deployment. It will deliver all the alerts generated as per Prometheus Rules.


Deploying Kubestate Metrics: Kubestate metrics deployment is needed to relay some important container metrics. These metrics are not natively exposed by the kubelet and are not directly available to Prometheus.


Deploying Node-exporter Daemonset: Node-exporter daemonset runs a node-exporter pod on each node. It exposes very important node metrics that can be pulled by Prometheus instances. 


Deploying Grafana This will create our Grafana deployment and Service which will be exposed using our ingress object. We should add thanos-querier as the datasource for our Grafana deployment. In order to do so:

  1. Click on Add DataSource
  2. Set Name: DS_PROMETHEUS 
  3. Set Type: Prometheus 
  4. Set URL: http://thanos-querier:9090
  5. Save and Test. You can now build your custom dashboards or simply import dashboards from grafana.net. Dashboard #315 and #1471 are a very good place to start. 


Deploying the Ingress Object: This is the final piece in the puzzle. This will help expose all our services outside the Kubernetes cluster and help us access them. 

Make sure you replace <yourdomain> with your own domain name. You can point the ingress-controller’s service to. 


You should now be able to access Thanos Querier at http://thanos-querier.<yourdomain>.com . It will look something like this:

Make sure deduplication is selected.

If you click on Stores, you will be able to see all the active endpoints discovered by thanos-store-gateway.

6. Grafana Dashboards

Finally, you add Thanos Querier as the datasource in Grafana and start creating dashboards.

Kubernetes Cluster Monitoring Dashboard:

Kubernetes Node Monitoring Dashboard:

7. Conclusion

Integrating Thanos with Prometheus allows you to scale Prometheus horizontally. Since Thanos Querier can pull metrics from other querier instances, you can pull metrics across clusters and visualize them in Grafana dashboards. Thanos lets us archive metric data in an object store that provides infinite storage for our monitoring system. It also serves metrics from the object storage itself. A major operating cost for this setup can be attributed to the object storage (S3 or GCS). This can be reduced if we apply appropriate retention policies to them. 

Today’s setup requires quite a bit of configuration on your part. The manifests provided above have been tested in a production environment and should make the process easy for you. Feel free to reach out should you have any questions around them. If you decide that you don’t want to do the configuration yourself, we have a hosted Prometheus offering where you can offload it to us and we will happily manage it for you. Try a free trial, or book a demo to talk to us directly.

This article was written by our guest blogger Vaibhav Thakur. If you liked this article, check out his LinkedIn for more.

Related Posts