We are bringing the best of Tectonic to Red Hat OpenShift to build the most secure, hybrid Kubernetes application platform.
Tectonic ships with the following alerts preconfigured by default.
Alert | Severity | Description |
---|---|---|
DeadMansSwitch |
none |
Alert triggers continuously to ensure that the entire Alerting pipeline is functional. For more information, see Dead Man’s Switch in Configuring Alertmanager. |
AlertmanagerConfigInconsistent |
critical |
The configuration of the instances of the Alertmanager cluster for a given service are out of sync. |
AlertmanagerDownOrMissing |
warning |
Alertmanager down or not discovered. An unexpected number of Alertmanagers are scraped or Alertmanagers have disappeared from discovery. |
APIServerErrorsHigh |
warning/critical |
The API server responds to a lot of requests with errors. |
APIServerLatencyHigh |
warning/critical |
The response latency of the API server to clients is high. |
DaemonSetRolloutStuck |
warning |
A daemon set is not fully rolled out to all desired nodes. |
DeploymentGenerationMismatch |
warning |
The observed generation of a deployment does not match its desired generation. |
DeploymentReplicasNotUpdated |
warning |
A deployment has not been rolled out properly. Either replicas are not being updated to the most recent version, or not all replicas are ready. The alert does not fire if the deployment was paused intentionally. |
FailedReload |
warning |
Reloading Alertmanager's or Prometheus’ configuration has failed for a given namespace. |
FdExhaustionClose |
two default alerts, with two severities: warning and critical |
File descriptors for the given job, namespace, pod, or instance will soon be exhausted. |
K8SApiServerLatency |
warning |
Kubernetes API server latency is high. More than 99th percentile latency for given requests to the kube-apiserver is above 1 second. |
K8SApiserverDown |
critical |
The API server is unreachable. Prometheus failed to scrape the API server(s), or all API servers have disappeared from service discovery. |
K8SControllerManagerDown |
critical |
There is no running K8S controller manager. Deployments and replication controllers are not making progress. |
K8SKubeletDown |
warning |
Many kubelets cannot be scraped. Prometheus failed to scrape the listed percentage of kubelets, or all kubelets have disappeared from service discovery. |
K8SKubeletTooManyPods |
warning |
Kubelet is close to pod limit. The given kubelet instance is running the listed number of pods, which is close to the limit of 110. |
K8SManyNodesNotReady |
critical |
More than 10% of the listed number of Kubernetes nodes are NotReady. |
K8SNodeNotReady |
warning |
The Kubelet on the listed node has not checked in with the API, or has set itself to NotReady, for more than an hour. |
K8SSchedulerDown |
critical |
There is no running Kubernetes scheduler. New pods are not being assigned to nodes. |
NodeExporterDown |
warning |
Prometheus could not scrape a node-exporter for more than 10m, or node-exporters have disappeared from discovery. |
NodeDiskRunningFull |
warning/critical |
If disks keep filling up at the current pace they will run out of free space within the next hours. |
PodFrequentlyRestart |
warning |
A pod is restarting several times an hour. |
PrometheusNotConnectedToAlertmanagers |
warning |
A monitored Prometheus instance is not connected to any Alertmanagers. Any firing alerts will not be sent anywhere. |
PrometheusNotificationQueueRunningFull |
warning |
Prometheus is generating more alerts than it can send to Alertmanagers in time. |
PrometheusErrorSendingAlerts |
warning/critical |
Prometheus encounters errors while trying to send alerts to Alertmanagers. |
TargetDown |
warning |
Targets are down. The listed percentage of job targets are down. |