You may have read recently on this blog about CoreOS investing development resources in the open source Prometheus monitoring system. Prometheus provides complete container cluster monitoring: instrumentation, collection, querying, and alerting. Monitoring is an integral part of ensuring infrastructure reliability and performance through observability, and Prometheus is a unified approach to monitoring all the components of a Kubernetes cluster, including the control plane, the worker nodes, and the applications running on the cluster.
This blog post walks through installing a Prometheus server in a Kubernetes cluster, monitoring the cluster components, and monitoring our own application services with an example of a Prometheus node exporter.
First, follow these steps to easily spin up a virtual multi-node Kubernetes cluster on your development box. Verify that your Kubernetes cluster is working by running
kubectl get nodes. The output should look like this:
NAME STATUS AGE 172.17.4.101 Ready,SchedulingDisabled 8m 172.17.4.201 Ready 7m 172.17.4.202 Ready 8m 172.17.4.203 Ready 8m
Each of these nodes is running a
kubelet, the Kubernetes node agent, which natively exports metrics in the Prometheus format. The kubelet also embeds cAdvisor, an application that exports Prometheus metrics about containers running on the node.
Additionally, etcd, the CoreOS distributed key value store, is running outside of the cluster, and etcd also exports interesting metrics in the Prometheus format.
We want to run a Prometheus setup inside of our cluster to retrieve all the provided metrics and allow us to query them on demand.
Prometheus supports service discovery using the Kubernetes API. This allows us to define a configuration that instantly adapts to changes in the targets we want to monitor. We configure Prometheus with a YAML configuration file, which we store in a Kubernetes ConfigMap. We can later mount the ConfigMap into our Prometheus pod to configure it.
In our example, the ConfigMap manifest
prometheus-configmap-1.yaml defines our Prometheus configuration.
There are two aspects to the example configuration:
First, a pointer to the etcd cluster: etcd runs outside of the Kubernetes cluster, and is either configured statically or via another mechanism for service discovery, such as DNS. In our setup, we point directly to the etcd cluster via its IP address:
- job_name: 'etcd' target_groups: - targets: - 172.17.4.51:2379
Second, configuration for monitoring Kubernetes components: The Kubernetes API provides service discovery information about the cluster’s kubelets and API servers. The following configuration section instructs Prometheus to retrieve this information, and to update its configuration as it detects changes from the Kubernetes API:
- job_name: 'kubernetes_components' kubernetes_sd_configs: - api_servers: - 'https://kubernetes' in_cluster: true # This configures Prometheus to identify itself when scraping # metrics from Kubernetes cluster components. tls_config: ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token # Prometheus provides meta labels for each monitoring target. We use # these to select targets we want to monitor and to modify labels attached # to scraped metrics. relabel_configs: # Only scrape apiserver and kubelets. - source_labels: [__meta_kubernetes_role] action: keep regex: (?:apiserver|node) # Redefine the Prometheus job based on the monitored Kubernetes component. - source_labels: [__meta_kubernetes_role] target_label: job replacement: kubernetes_$1 # Attach all node labels to the metrics scraped from the components running # on that node. - action: labelmap regex: __meta_kubernetes_node_label_(.+)
Prometheus stores service discovery information in meta labels. These meta labels are used in Prometheus configurations to select and drop targets, or to generate proper labels for collected metrics. This happens by applying relabeling rules. The Prometheus documentation provides more detail on relabeling rules and the meta labels exposed by the Kubernetes service discovery integration. Meta labels are discarded after the targets are correctly generated.
Create a Kubernetes ConfigMap by downloading the
prometheus-configmap-1.yaml manifest file and uploading it to the Kubernetes API:
$ kubectl create -f prometheus-configmap-1.yaml configmap "prometheus" created
Now create a Prometheus deployment of the CoreOS Prometheus container image, and a service exposing it on NodePort 30900. This deployment is defined in the prometheus deployment manifest):
$ kubectl create -f prometheus-deployment.yaml You have exposed your service on an external port on all nodes in your cluster. If you want to expose this service to the external internet, you may need to set up firewall rules for the service port(s) (tcp:30900) to serve traffic. See http://releases.k8s.io/release-1.2/docs/user-guide/services-firewalls.md for more details. service "prometheus" created deployment "prometheus" created
After a few seconds, the Prometheus pod should be up and running. We can use any node in our cluster to access Prometheus on service port 30900. In our example Vagrant cluster, opening
http://172.17.4.201:30900/targets in a web browser will show the “Targets” page of the Prometheus web UI.
In the example configuration, the page should show three monitoring jobs: etcd, the Kubernetes API server, and the Kubernetes nodes.
You can go to the “Graph” tab in the web interface to examine the metrics Prometheus is collecting.
Check the Prometheus documentation for an in-depth explanation of all query language features.
In addition to monitoring the critical components of our Kubernetes cluster, we need to monitor the services running inside of it. We can write a generic Prometheus configuration section to automatically detect new services and adjust to any changes to their endpoints:
- job_name: 'kubernetes_services' kubernetes_sd_configs: - api_servers: - 'https://kubernetes' in_cluster: true Relabel_configs: # We only monitor endpoints of services that were annotated with # prometheus.io/scrape=true in Kubernetes - source_labels: [__meta_kubernetes_role, __meta_kubernetes_service_annotation_prometheus_io_scrape] action: keep regex: endpoint;true # Rewrite the Kubernetes service name into the Prometheus job label. - source_labels: [__meta_kubernetes_service_name] target_label: job # Attach the namespace as a label to the monitoring targets. - source_labels: [__meta_kubernetes_namespace] target_label: namespace # Attach all service labels to the monitoring targets. - action: labelmap regex: __meta_kubernetes_service_label_(.+)
prometheus-configmap-2.yaml manifest file) includes this updated configuration. Download it and use it to
replace the "prometheus" ConfigMap we loaded earlier:
$ kubectl replace -f prometheus-configmap-2.yaml configmap "prometheus" replaced
It will take up to two minutes for Kubernetes to refresh the configuration file in the Prometheus pod. Once that has happened, trigger a reload by sending a POST to the Prometheus
$ curl -XPOST http://172.17.4.201:30900/-/reload Reloading configuration file...
The targets page in the Prometheus web UI will now show a new monitoring job named “prometheus”, which monitors our single Prometheus server’s metrics.
Kubernetes keeps track of many metrics that Prometheus can read by default. Even so, there’s often more information we want to know about our cluster and infrastructure that needs to be translated to a compatible format for Prometheus. To do so, Prometheus uses exporters, small programs that read metrics from other sources and translate them to the Prometheus format. The node exporter can read system-level statistics about bare-metal nodes or virtual machines and export them for Prometheus.
Using a DaemonSet, Kubernetes can run one node exporter per cluster node, and expose the node exporter as a service.
Download the node exporter daemon set manifest and deploy it:
$ kubectl create -f node-exporter.yaml daemonset "node-exporter" created
Verify that four node exporter pods have been started:
$ kubectl get pods NAME READY STATUS RESTARTS AGE node-exporter-4r4vq 1/1 Running 0 1m node-exporter-6n2ah 1/1 Running 0 1m node-exporter-9x57u 1/1 Running 0 1m node-exporter-dk99a 1/1 Running 0 1m prometheus-1189099554-6ah3y 1/1 Running 0 1h
That’s it – nothing else is necessary to get your machine-level metrics into Prometheus. With the configuration we used in our example, Prometheus will automatically scrape all node exporters for metrics, once they are deployed. You can verify this by navigating to the targets page in the Prometheus UI.
We have set up Prometheus inside of Kubernetes and configured it to monitor machine-level metrics, cluster components, and Prometheus itself.
Prometheus is also able to monitor custom services that export metrics in the Prometheus format. You can easily point your dashboard solutions to the Prometheus service, and start adding alerting and recording rules to the ConfigMap for those services.
Join us in Berlin
Want to work on Prometheus at CoreOS? We are hiring engineers to help build the next generation of monitoring and alerting software in our Berlin office.