This guide assumes you have a basic understanding of the
Prometheus resource and have read the getting started.
ServiceMonitor resource the Prometheus Operator also introduces the
Alertmanager. It allows declaratively describing an Alertmanager cluster. Before diving into deploying an Alertmanager cluster, it is important to understand the contract between Prometheus and Alertmanager.
The Alertmanager's features include:
Prometheus' configuration includes so called rule files, which contain the alerting rules. When an alerting rule triggers it fires that alert against all Alertmanager instances, on every rule evaluation interval. The Alertmanager instances communicate to each other which notifications have already been sent out. You can read more about why these systems have been designed this way in the High Availability scheme description.
Let's create an example Alertmanager cluster, with three instances.
apiVersion: monitoring.coreos.com/v1alpha1 kind: Alertmanager metadata: name: example spec: replicas: 3
The Alertmanager instances will not be able to start up, unless a valid configuration is given. This is an example configuration, that does not actually do anything as it sends notifications against a non existent
webhook, but will allow the Alertmanager to start up. Read more about how to configure the Alertmanager on the upstream documentation.
global: resolve_timeout: 5m route: group_by: ['job'] group_wait: 30s group_interval: 5m repeat_interval: 12h receiver: 'webhook' receivers: - name: 'webhook' webhook_configs: - url: 'http://alertmanagerwh:30500/'
Save the above alertmanager config in a file called
alertmanager.yaml and create a secret from it using
$ kubectl create secret generic alertmanager-example --from-file=alertmanager.yaml
Once created this
Secret is mounted by Alertmanager
Pods created through the
To be able to view the web UI, expose it via a
Service. A simple way to do this is to use a
Service of type
apiVersion: v1 kind: Service metadata: name: alertmanager-example spec: type: NodePort ports: - name: web nodePort: 30903 port: 9093 protocol: TCP targetPort: web selector: alertmanager: example
Once created it allows the web UI to be accessible via a node's IP and the port
Now this is a fully functional highly available Alertmanager cluster, but it does not get any alerts fired against it. Let's setup Prometheus instances that will actually fire alerts against it.
apiVersion: monitoring.coreos.com/v1alpha1 kind: Prometheus metadata: name: example spec: replicas: 2 alerting: alertmanagers: - namespace: default name: alertmanager-example port: web serviceMonitorSelector: matchLabels: team: frontend resources: requests: memory: 400Mi ruleSelector: matchLabels: role: prometheus-rulefiles prometheus: example
Prometheus rule files are held in a
ConfigMaps to mount rule files from are selected with a label selector field called
ruleSelector in the Prometheus object, as seen above. All top level files that end with the
.rules extension will be loaded.
The best practice is to label the
ConfigMaps containing rule files with
role: prometheus-rulefiles as well as the name of the Prometheus object,
prometheus: example in this case.
kind: ConfigMap apiVersion: v1 metadata: name: prometheus-example-rules labels: role: prometheus-rulefiles prometheus: example data: example.rules: | ALERT ExampleAlert IF vector(1)
ConfigMap always immediately triggers an alert, which is only for demonstration purposes. To validate that everything is working properly have a look at each of the Prometheus web UIs.
To be able to view the web UI without a
kubectl's proxy functionality can be used.
kubectl proxy --port=8001
Then the web UI of each Prometheus instance can be viewed, they both have a firing alert called
ExampleAlert, as defined in the loaded alerting rules.
Looking at the status page for "Runtime & Build Information" on the Prometheus web UI shows the discovered and active Alertmanagers that the Prometheus instance will fire alerts against.
These show three discovered Alertmanagers.
Heading to the Alertmanager web UI now shows one active alert, although all Prometheus instances are firing it. Configuring the Alertmanager further allows custom alert routing, grouping and notification mechanisms.