Enterprise Kubernetes, delivered

Tectonic ships with CoreOS's signature automated operations, runs multi-cloud, and is the fastest, most secure path to Kubernetes.

Default alerts

Tectonic ships with the following alerts preconfigured by default.

Alert Severity Description
DeadMansSwitch none Alert triggers continuously to ensure that the entire Alerting pipeline is functional. For more information, see Dead Man’s Switch in Configuring Alertmanager.
AlertmanagerConfigInconsistent critical The configuration of the instances of the Alertmanager cluster for a given service are out of sync.
AlertmanagerDownOrMissing warning Alertmanager down or not discovered. An unexpected number of Alertmanagers are scraped or Alertmanagers have disappeared from discovery.
APIServerErrorsHigh warning/critical The API server responds to a lot of requests with errors.
APIServerLatencyHigh warning/critical The response latency of the API server to clients is high.
DaemonSetRolloutStuck warning A daemon set is not fully rolled out to all desired nodes.
DeploymentGenerationMismatch warning The observed generation of a deployment does not match its desired generation.
DeploymentReplicasNotUpdated warning A deployment has not been rolled out properly. Either replicas are not being updated to the most recent version, or not all replicas are ready. The alert does not fire if the deployment was paused intentionally.
FailedReload warning Reloading Alertmanager's or Prometheus’ configuration has failed for a given namespace.
FdExhaustionClose two default alerts, with two severities: warning and critical File descriptors for the given job, namespace, pod, or instance will soon be exhausted.
K8SApiServerLatency warning Kubernetes API server latency is high. More than 99th percentile latency for given requests to the kube-apiserver is above 1 second.
K8SApiserverDown critical The API server is unreachable. Prometheus failed to scrape the API server(s), or all API servers have disappeared from service discovery.
K8SControllerManagerDown critical There is no running K8S controller manager. Deployments and replication controllers are not making progress.
K8SKubeletDown warning Many kubelets cannot be scraped. Prometheus failed to scrape the listed percentage of kubelets, or all kubelets have disappeared from service discovery.
K8SKubeletTooManyPods warning Kubelet is close to pod limit. The given kubelet instance is running the listed number of pods, which is close to the limit of 110.
K8SManyNodesNotReady critical More than 10% of the listed number of Kubernetes nodes are NotReady.
K8SNodeNotReady warning The Kubelet on the listed node has not checked in with the API, or has set itself to NotReady, for more than an hour.
K8SSchedulerDown critical There is no running Kubernetes scheduler. New pods are not being assigned to nodes.
NodeExporterDown warning Prometheus could not scrape a node-exporter for more than 10m, or node-exporters have disappeared from discovery.
NodeDiskRunningFull warning/critical If disks keep filling up at the current pace they will run out of free space within the next hours.
PodFrequentlyRestart warning A pod is restarting several times an hour.
PrometheusNotConnectedToAlertmanagers warning A monitored Prometheus instance is not connected to any Alertmanagers. Any firing alerts will not be sent anywhere.
PrometheusNotificationQueueRunningFull warning Prometheus is generating more alerts than it can send to Alertmanagers in time.
PrometheusErrorSendingAlerts warning/critical Prometheus encounters errors while trying to send alerts to Alertmanagers.
TargetDown warning Targets are down. The listed percentage of job targets are down.