This guide assumes you have a basic understanding of the Prometheus resource and have read the getting started guide.

The Prometheus Operator introduces an Alertmanager resource, which allows users to declaratively describe an Alertmanager cluster. To successfully deploy an Alertmanager cluster, it is important to understand the contract between Prometheus and Alertmanager.

The Alertmanager may be used to:

  • Deduplicate alerts fired by Prometheus
  • Silence alerts
  • Route and send grouped notifications via providers (PagerDuty, OpsGenie, …)

Prometheus' configuration also includes "rule files", which contain the alerting rules. When an alerting rule triggers it fires that alert against all Alertmanager instances, on every rule evaluation interval. The Alertmanager instances communicate to each other which notifications have already been sent out. For more information on this system design, see the High Availability scheme description.

First, create an example Alertmanager cluster, with three instances.

kind: Alertmanager
  name: example
  replicas: 3

The Alertmanager instances will not be able to start up, unless a valid configuration is given. The following example configuration sends notifications against a non-existent webhook, allowing the Alertmanager to start up, without issuing any notifications. For more information on configuring Alertmanager, see the Prometheus Alerting Configuration document.

  resolve_timeout: 5m
  group_by: ['job']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 12h
  receiver: 'webhook'
- name: 'webhook'
  - url: 'http://alertmanagerwh:30500/'

Save the above Alertmanager config in a file called alertmanager.yaml and create a secret from it using kubectl.

Alertmanager instances require the secret resource naming for follow the format alertmanager-{ALERTMANAGER_NAME}. In the previous example, the name of the Alertmanager is example, so the secret name must be alertmanager-example, and the name of the config file alertmanager.yaml

$ kubectl create secret generic alertmanager-example --from-file=alertmanager.yaml

Note that Altermanager configurations can use templates (.tmpl files), which can be added on the secret along with the alertmanager.yaml config file. For example:

apiVersion: v1
kind: Secret
  name: alertmanager-example
  alertmanager.yaml: {BASE64_CONFIG}
  template_1.tmpl: {BASE64_TEMPLATE_1}
  template_2.tmpl: {BASE64_TEMPLATE_2}

Templates will be placed on the same path as the configuration. To load the templates, the configuration (alertmanager.yaml) should point to them:

- '*.tmpl'

Once created this Secret is mounted by Alertmanager Pods created through the Alertmanager object.

To be able to view the web UI, expose it through a Service. A simple way to do this is to use a Service of type NodePort.

apiVersion: v1
kind: Service
  name: alertmanager-example
  type: NodePort
  - name: web
    nodePort: 30903
    port: 9093
    protocol: TCP
    targetPort: web
    alertmanager: example

Once created it allows the web UI to be accessible via a Node's IP and the port 30903.

This Alertmanager cluster is now fully functional and highly available, but no alerts are fired against it. Create Prometheus instances to fire alerts to the Alertmanagers.

kind: Prometheus
  name: example
  replicas: 2
    - namespace: default
      name: alertmanager-example
      port: web
      team: frontend
      memory: 400Mi
      role: prometheus-rulefiles
      prometheus: example

The above configuration specifies a Prometheus that finds all of the Alertmanagers behind the Service created with alertmanager-example-service.yaml. The alertmanagers name and port fields should match those of the Service to allow this to occur.

Prometheus rule files are held in ConfigMaps. Use the label selector field ruleSelector in the Prometheus object to define the ConfigMaps from which rule files will be mounted. All top level files that end with the .rules extension will be loaded.

The best practice is to label the ConfigMaps containing rule files with role: prometheus-rulefiles as well as the name of the Prometheus object, prometheus: example in this case.

kind: ConfigMap
apiVersion: v1
  name: prometheus-example-rules
    role: prometheus-rulefiles
    prometheus: example
  example.rules.yaml: |+
    - name: ./example.rules
      - alert: ExampleAlert
        expr: vector(1)

That example ConfigMap always immediately triggers an alert, which is only for demonstration purposes. To validate that everything is working properly have a look at each of the Prometheus web UIs.

Use kubectl's proxy functionality to view the web UI without a Service.


kubectl proxy --port=8001

Then the web UI of each Prometheus instance can be viewed, they both have a firing alert called ExampleAlert, as defined in the loaded alerting rules.

  • http://localhost:8001/api/v1/proxy/namespaces/default/pods/prometheus-example-0:9090/alerts
  • http://localhost:8001/api/v1/proxy/namespaces/default/pods/prometheus-example-1:9090/alerts

Looking at the status page for "Runtime & Build Information" on the Prometheus web UI shows the discovered and active Alertmanagers that the Prometheus instance will fire alerts against.

  • http://localhost:8001/api/v1/proxy/namespaces/default/pods/prometheus-example-0:9090/status
  • http://localhost:8001/api/v1/proxy/namespaces/default/pods/prometheus-example-1:9090/status

These show three discovered Alertmanagers.

Heading to the Alertmanager web UI now shows one active alert, although all Prometheus instances are firing it. Configuring the Alertmanager further allows custom alert routing, grouping and notification mechanisms.