Monitoring multiple OKE clusters with Prometheus, Thanos and Grafana — Part 1

Ali Mukadam
Oracle Developers
Published in
10 min readNov 29, 2021

--

In a previous article, we deployed multiple OKE clusters for Verrazzano in different OCI regions. Now, we want to monitor them.

Monitoring 1 cluster is relatively straightforward:

  1. Install Prometheus and Grafana
  2. Get Prometheus to monitor your applications or workloads either by scraping metrics or by pushing the metrics to it
  3. In Grafana, use Prometheus as a data source to create your dashboards that will help you understand how your applications or workloads are doing
  4. Use AlertManager to get notified when something goes wrong
Monitoring Redis with Prometheus and Grafana

1 issue with Prometheus is that it runs as a singleton. This means that Prometheus cannot run in a cluster and you cannot have high availability. If the host where the Prometheus instance crashes, that’s tough luck as there’s no alternative Prometheus instance to fulfill that duty.

A second issue which compounds the first issue above is that Prometheus also stores the metrics locally on disk. If the Prometheus instance crashes, you will lose some metrics. If the disk gets corrupted in the process, then you will lose everything unless you have done backups.

One way people have been handling this until a better solution could be engineered was to run at least a second instance and have it scrape the same endpoints and ensure the Prometheus instances run on different hosts. While this gives you the illusion of high availability, it is not practically scalable as this places an upper limit on Prometheus. It is also not very efficient as you are basically scraping the same data twice. How then to have Prometheus run with high availability?

The third issue is that Prometheus keeps the data for a limited time only. This CNCF video explains the issues very well. How can we then have long term storage for our metrics?

The fourth issue is how to monitor multiple clusters? We can have an instance of Prometheus running in each cluster and deploy them in a federated mode. In our 4 nodes cluster, this means that the admin cluster in Singapore will be scraping specified time series in the Prometheus servers deployed in the Sydney, Mumbai and Tokyo regions:

Federated Prometheus

However, we will still be running a single instance of Prometheus (again the singleton issue). We can always add another Prometheus instance which will also scrape the same time series:

2 instances of Prometheus federating other Prometheus servers

Again this is not very efficient either and the costs can be quite high too. And we still have not solved the long term storage issue.

Enter Thanos.

The Thanos project aims to provide high availability and long term storage for Prometheus.

In this article, we will look at how to monitor 1 OKE cluster using Prometheus, Thanos and Grafana. In a subsequent article, we will look at adding multiple clusters.

Architecture

The following diagram depicts the architecture of Thanos:

Thanos Architecture (source: thanos.io)

As you can see, it has many components and if you are interested, there is a very good explanation of their roles here.

I’ll partly digest this for you for the purpose of our setup:

  • the Thanos sidecar is deployed as a sidecar container to the Prometheus pod in each region. The sidecar reads the Prometheus data for query and then it uploads TSDB blocks to object storage. So, the worker nodes need to be able to call Object storage which means routing through the Service Gateway is required. As this is a requirement for OKE worker nodes anyway, we do not need to do anything in order for the sidecar to be able to upload the TSDB blocks to OCI object storage.
  • the Thanos Store gateway queries metrics stored in object storage and presents an API to allow their retrieval. It listens on port 10901 by default. When we monitor multiple clusters, we need to ensure this port to be open by modifying the relevant NSG. Now, since we want this to be a reliable interface, we want to create this gateway as a LoadBalancer service which will create an OCI Load Balancer. Additionally, we want also this to be a private Load Balancer and placed in the private Load Balancer subnet. We therefore need to modify the internal Load Balancer NSG.
Thanos in a single cluster

Also note that:

  1. there still is the issue of Thanos pushing the TSDB blocks every 2 hours
  2. there is a variation to the above architecture using the receive component.

But looking into these will be for subsequent posts. For now though, we are mostly interested in solving the high availability and long term data storage issues.

Creating and using the object store

Before doing the deployment, we need to create a bucket in OCI object storage to store the TSDB blocks. As of this writing, there is no native integration between Thanos and OCI Object Storage yet. We do have a PR open on GitHub with the Thanos project to have OCI Object storage as one of the data stores. Until that happy day arrives, we will use OCI’s S3 interface instead. Like the Keymaker said:

“Always another way”

Now, in order to use the S3 interface, we need to create a Customer Secret Key for the user that is going to be used for Thanos integration. Login to the OCI console, click on the User icon and navigate to Customer Secret Keys. Click on Generate Secret Key.

Then, make sure you copy the key:

Generating a secret key

Next, ensure you are in the admin region, in our case, Singapore. Navigate to Storage > Object Storage and create a Bucket:

Create a bucket named ‘thanos’

I imaginatively call it thanos.

Configuring Thanos object store

The next thing we need to do is configuring the S3 endpoint. The S3 compatibility API can be found here:

It has the following format:

<object_storage_namespace>.compat.objectstorage.region.oraclecloud.com

Next, create an objectstore file for Singapore e.g. thanos-sin-storage.yaml:

Replace the values for the endpoint, the access key and the secret key. The access_key can be found in your User Profile > Customer Secret Key page and the secret_key you generated and saved earlier. Then, create a namespace called monitoring and store this configuration in a Kubernetes secret:

kubectl create ns monitoringkubectl -n monitoring create secret generic thanos-objstore-config --from-file=thanos-sin-storage.yaml=thanos-sin-storage.yaml

Deploy Prometheus on the cluster

We will now deploy Prometheus with the sidecar to the cluster.

We will use the kube-prometheus helm chart from the fine folks at Bitnami. Add the bitnami helm repo and run helm update:

helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update

Generate the default manifest for kube-prometheus:

helm show values bitnami/kube-prometheus > prometheusvalues.yaml

The following helm chart properties need to be added/changed (in italic: indicates the path of the property in the helm manifest; in bold: indicates the values that need to be obtained from OCI and replaced in the manifest):

prometheus.thanos.create: true
prometheus.thanos.objectStorageConfig.secretName: thanos-objstore-config
prometheus.thanos.objectStorageConfig.secretKey: thanos-sin-storage.yaml
prometheus.thanos.service.type: LoadBalancer
prometheus.thanos.service.annotations: oci.oraclecloud.com/oci-network-security-groups: "nsg_id"
service.beta.kubernetes.io/oci-load-balancer-shape: "flexible"
service.beta.kubernetes.io/oci-load-balancer-shape-flex-min: "50"
service.beta.kubernetes.io/oci-load-balancer-shape-flex-max: "100"
service.beta.kubernetes.io/oci-load-balancer-subnet1: "subnet_id"
service.beta.kubernetes.io/oci-load-balancer-internal: "true"
service.beta.kubernetes.io/oci-load-balancer-security-list-management-mode: "All"
prometheus.externalLabels:
cluster: "sin"

UPDATE: Because of a (just discovered) missing rule in the Terraform module, I have changed the management mode to “All”. This means 2 rules will be added to the default security list.

And install Prometheus:

helm install prometheus bitnami/kube-prometheus \
--namespace monitoring \
-f prometheusvalues.yaml

Verify the Prometheus deployment:

  1. A private OCI Load Balancer should be created by Prometheus
  2. The load balancer should have a TCP listener on port 10901
  3. The load balancer should also be deployed in the internal load balancer subnet and assigned the int-lb NSG

At this point, there are 2 ways you can update the NSG associated with the internal load balancer to accept incoming TCP on 10901 so you can use it later:

  1. Update your terraform.tfvars and update the internal_lb_allowed_ports parameter to allow port 10901. You need to run terraform apply again after this.
  2. Or your add it directly to the NSG using the OCI Console. UPDATE: Because I’ve changed the management mode to “All”, you do not need to do this for now.

4. If you want to shore things up on the security front, you can consider:

a. adding SSL certificates using cert-manager

b. use the new WAF integration feature. This will require you to set your Load Balancer to Flexible though.

5. Create an SSH tunnel to the operator host so you can test the Prometheus’ expression browser:

ssh -L 9090:localhost:9090 -J opc@bastion_public_ip opc@operator_private_ipkubectl port-forward --namespace monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090

6. Access Prometheus’ expression browser: http://localhost:9090/ and run a query:

Prometheus expression browser

Prometheus has been deployed.

Deploy Thanos on the cluster

It’s time to deploy Thanos and we will be using the thanos helm chart. Generate the default manifest for the thanos helm chart:

helm show values bitnami/thanos > thanosvalues.yaml

The following helm chart properties need to be added/changed (in italic: indicates the path of the property in the helm manifest; in bold: indicates the values that need to be obtained from OCI and replaced in the manifest):

objstoreConfig: |-
type: S3
config:
bucket: "thanos"
endpoint: "<namespace>.compat.objectstorage.<region>.oraclecloud.com"
region: "<region>"
access_key: "<my_access_key>"
insecure: false
signature_version2: false
secret_key: "<my_secret_key>"
put_user_metadata: {}
http_config:
idle_conn_timeout: 1m30s
response_header_timeout: 2m
insecure_skip_verify: false
tls_handshake_timeout: 10s
expect_continue_timeout: 1s
max_idle_conns: 100
max_idle_conns_per_host: 100
max_conns_per_host: 0
trace:
enable: false
list_objects_version: ""
part_size: 67108864
sse_config:
type: ""
kms_key_id: ""
kms_encryption_context: {}
encryption_key: ""
query.enabled: true
query.stores:
# Private IP address of internal load balancer
- 123.123.123.123:10901
queryFrontend.enabled: true
bucketweb.enabled: true
compactor.enabled: true
storegateway.enabled: true
ruler.enabled: true

Deploy Thanos:

helm install thanos bitnami/thanos \
--namespace monitoring \
-f thanosvalues.yaml

Verify that:

1. All pods are working properly:

kubectl -n monitoring get pods

2. Thanos Query is working properly:

export SERVICE_PORT=$(kubectl get --namespace monitoring -o jsonpath="{.spec.ports[0].port}" services thanos-query)kubectl port-forward --namespace monitoring svc/thanos-query ${SERVICE_PORT}:${SERVICE_PORT}

by accessing it in the browser at http://localhost:9090

Accessing Thanos Query

3. the Singapore store is registered: http://localhost:9090/stores

Singapore Store is replicated

In the screenshot above, we can also see the External Labels “cluster=sin”, indicating our Verrazzano admin cluster in Singapore.

Deploying Grafana

The last step is to use Grafana with Thanos. 1 among many clever design decisions the Thanos developers did was to keep a Prometheus-compatible interface. This means that with minimal effort and changes, we can use existing dashboards created in Grafana.

Let’s add the helm repo first:

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

And then install Grafana:

helm install --namespace monitoring grafana grafana/grafana

We need to get the Grafana admin user’s password:

kubectl get secret --namespace monitoring grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

Since Grafana is installed by default as ClusterIP, we can do some port-forwarding with SSH:

ssh -L 3000:localhost:3000 -L 9090:localhost:9090 -J opc@bastion_public_ip opc@operator_private_ip

and kubectl:

kubectl port-forward --namespace monitoring svc/grafana 3000:80

Access Grafana in your browser http://localhost:3000/. Login with username ‘admin’ and the password you retrieved above.

Connecting Grafana and Thanos

In order to use Thanos and the metrics it is pulling from the various Prometheii, we first need to add Thanos as a data source. Recall that Thanos provides the same interface as Prometheus. So, we can just use the Prometheus plugin to add a Thanos data source:

  1. Locate the Configuration icon on the left and from the expanding menu, select data sources.
  2. Click Add data source and select Prometheus.
Selecting Prometheus

3. Enter http://thanos-query.monitoring.svc.cluster.local:9090 in the URL

4. Then, click Save & test. The test should return Data source is working.

Adding a Kubernetes dashboard to Grafana

In order to create a Grafana dashboard, we will be importing the Kubenetes Cluster monitoring dashboard. There is a more detailed one if you fancy. It is a fantastic piece of work and my only beef with it is that it creates the wall of graphs. It is hard enough to monitor 1 Kubernetes cluster and it will be even harder to monitor a group of clusters spread geographically, never mind when you are in a crisis and need to troubleshoot. So, we want a bit more simplicity (which is subjective anyway).

  1. Click on the + icon on the left and from the expanded menu, select Import.
  2. Enter the dashboard ID: 315
  3. Click on Load.
  4. Ensure you select the Prometheus Data source you created and then click Import.

You should now be able to see your Kubernetes Dashboard:

Kubernetes Dashboard of 1 OKE cluster

At this point, we know getting the metrics from Prometheus to Thanos works and we can use Thanos as a datasource. However, this works for only 1 cluster and that too the cluster where we have deployed Thanos itself.

In Part 2, we will add more clusters and modify the dashboard so we can pick individual clusters and analyze their metrics.

Update: security-list management mode changed to “All” because of a missing rule in the terraform module.

--

--