Skip to main content

Prometheus, the backbone of container monitoring, hits 2.0

Prometheus 2.0 logo

 

Kubernetes makes management of complex environments easy, but to ensure availability it's crucial to have operational insight into the Kubernetes components and all applications running on the cluster. At CoreOS, we believe monitoring is the backbone of a good production environment, which is why we are investing in development of the Prometheus monitoring system. A project hosted by the Cloud Native Computing Foundation (CNCF), Prometheus has rapidly gained popularity for infrastructure and application monitoring alike, and today it's taking its next step forward.

CoreOS ships Prometheus as an integral component of our Tectonic enterprise-ready Kubernetes platform, and we're always working enhance its value for Kubernetes users. Earlier this year, we shared our work on the new storage layer for Prometheus's next generation: version 2.0. Since then, we have worked closely with the wider Prometheus team and our users to stabilize this work. After three alphas, six betas, and three release candidates, today Prometheus 2.0 is officially declared stable. A thank you to Brian Brazil and Goutham Veeramachaneni, among the individuals who were instrumental in this effort.

Before we explore the benefits of this release, though, let's take a step back and discuss why we needed a new storage layer.

Time series and dynamic environments

Prometheus's philosophy of monitoring encourages highly detailed metrics instrumentation across all layers of the stack. The status of containers, the requests flowing through them, and even the deep internals of the applications running inside of them are all explorable through metrics, and Prometheus comes with a powerful query language to help aggregate these metrics into actionable insights.

Prometheus collects and stores data as time series, which are sequences of timestamped data points collected at regular intervals. This design can easily result in thousands or more time series exposed for each running container. At a scale of hundreds to thousands of containers, it's not uncommon to see millions of time series being tracked across a cluster.

Continuously writing data to millions of time series is an obvious technical challenge. To make matters worse, however, Kubernetes makes it really easy to continuously destroy containers and create new ones. This model is extremely powerful for continuous deployments, auto-scaling, and scheduling of batch jobs, and thus will only become more common over time.

Each container has a unique identity, and all of its time series are associated with that identity for optimal insight. So while there is a roughly fixed amount of time series that are actively being tracked, the total history of time series data that Prometheus makes accessible grows continuously. Enabling queries across up to billions of time series is a whole new challenge, but we were determined to make Prometheus well equipped for it.

The new storage engine was written to tackle this challenge head on. It uses an inverted index, inspired by full text search, to provide fast lookups across the arbitrary dimensions that time series in Prometheus may have. A new disk format ensures good collocation of related time series data, while a write-ahead log makes Prometheus 2.0 resilient to crashes.

As a result, Prometheus has also become easier to operate. Users of Prometheus 1.x are all too familiar with the various configuration knobs to adjust the storage to the expected load. With the addition of the new storage layer, these controls are no longer needed.

Benchmarks

The real-world results of this work are nothing short of impressive. Resource consumption in Prometheus 2.0 has been greatly reduced and stabilized across all resources, while the new indexing approach provides lower and more consistent query latency.

The graphs below show the results from a benchmarking setup where a Kubernetes cluster running hundreds of application pods is being monitored. Pods are replaced at a very high frequency to reflect time series churn. Two instances of Prometheus 1.5 and two instances of Prometheus 2.0 are running and ingesting new data. One instance of each version is additionally exposed to a moderately high querying load.

As we can see from the first two graphs, memory and CPU consumption are both significantly lower for Prometheus 2.0, and they reach a stable state rather quickly after startup. Prometheus 1.5, aside from demanding more resources, exhibits much less predictable resource usage:

Graph 1: Memory usage in GB
Graph 1: Memory usage in GB

 

Graph 2: CPU usage in cores/second
Graph 2: CPU usage in cores/second

The most dramatic improvement in Prometheus 2.0, however, is in the amount data written to disk per second, as seen in the next graph. The new version writes up to two orders of magnitude less data to disk. This increases the lifetime of SSDs, which lowers costs. Under high series churn, significant disk space savings can be observed, even though the same time series compression algorithm is being used:

Graph 3: Bytes written to disk in MB/second
Graph 3: Bytes written to disk in MB/second

Lastly, query latency increases linearly in Prometheus 1.5 as more series are created over time. Prometheus 2.0 maintains stable performance from the beginning, which leads to significantly more responsive queries, as shown in the graph below:

Graph 4: Query latency is now lower and much more consistent over time
Graph 4: Query latency is now lower and much more consistent over time

What else is new?

Much of the work on Prometheus 2.0 was focused on improving performance and making it even simpler to operate. But the new storage layer was also designed to be easy to interact with outside of Prometheus, which enables a wide range of custom tooling around the time series data it collects.

Snapshot backups have been a frequently requested feature. When started with the flag --web.enable-admin-api, Prometheus 2.0 natively supports them through a simple API call:

$ curl -XPOST http://<prometheus>/api/v2/admin/tsdb/snapshot
{"name":"2017-10-18T13:44:35Z-3f6a679bb001e65d"}

The snapshot is located in a directory with the returned name and can uploaded to an archive store, all while barely occupying any additional disk space. Best of all, a new Prometheus server can be started with the backed-up data simply by moving it to the new server's data directory.

For a full list of changes and how to migrate from Prometheus 1.x, check out the official announcement and the migration guidelines.

Try it out!

The best part about the enhancements in Prometheus 2.0 is that Prometheus now not only supports today's Kubernetes workloads better than ever, but it also has room to support tomorrow's workloads as Kubernetes enables increasing dynamism in infrastructure.

Learn about Prometheus in a webinar on November 16 at 8 AM PT by fellow Prometheus developer Frederic Branczyk about the new features.

If you'd like to see for yourself how integrated Prometheus support helps make CoreOS Tectonic the truly enterprise-ready Kubernetes platform, you can deploy a Tectonic cluster of up to 10 nodes for free. You'll then be able to use our latest Prometheus Operator to effortlessly deploy Prometheus 2.0 on your cluster without any extra configuration.

Note: The benchmarks for this post were produced using prombench. To reproduce them you will need an AWS account configured and you must specify the benchmarking spec you want to run. The spec.example.yaml spec is what was used to produce the graphs shown in this article.