Skip to main content

What makes a cluster a cluster?

“What makes a cluster a cluster?” - Ask that question of 10 different engineers and you’ll get 10 different answers. Some look at it from a hardware perspective, some see it as a particular set of cloud technologies, and some say it’s the protocols exchanging information on the network.

With this ever-growing field of distributed systems technologies, it is helpful to compare the goals, roles and differences of some of these new projects based on their functionality. In this post we propose a conceptual description of the cluster at large, while showing some examples of emerging distributed systems technologies.

Layers of abstraction

The tech community has long agreed on what a network looks like. We’ve largely come to agree, in principle, on the OSI (Open Systems Interconnection) model (and in practice, on its close cousin, the TCP/IP model).

A key aspect of this model is the separation of concerns, with well-defined responsibilities and dependence between components: every layer depends on the layer below it and provides useful network functionality (connection, retry, packetization) to the layer above it. At the top, finally, are web sessions and applications of all sorts running and abstracting communication.

So, as an exercise to try to answer “What makes a cluster a cluster?” let’s apply the same sort of thinking to layers of abstraction in terms of execution of code on a group of machines, instead of communication between these machines.

Here’s a snapshot of the OSI model, applied to containers and clustering:

OSI Applied to Clustering

Let’s take a look from the bottom up.

Level 1, Hardware

The hardware layer is where it all begins. In a modern environment, this may mean physical (bare metal) or virtualized hardware – abstraction knows no bounds – but for our purposes, we define hardware as the CPU, RAM, disk and network equipment that is rented or bought in discrete units.

Examples: bare metal, virtual machines, cloud

Level 2, OS/Machine ABI

The OS layer is where we define how software executes on the hardware: the OS gives us the Application Binary Interface (ABI) by which we agree on a common language that our userland applications speak to the OS (system calls, device drivers, and so on). We also set up a network stack so that these machines can communicate amongst each other. This layer therefore provides our lowest level complete execution environment for applications.

Now, traditionally, we stop here, and run our final application on top of this as a third pseudo-layer of the OS and various user-space packages. We provision individual machines with slightly different software stacks (a database server, an app server) and there’s our server rack.

Over the lifetime of servers and software, however, the permutations and histories of individual machine configurations start to become unwieldy. As an industry, we are learning that managing this complexity becomes costly or infeasible over time, even at moderate scale (e.g. 3+ machines).

This is often where people start to talk about containers, as containers treat the entire OS userland as one hermetic application package that can be managed as an independent unit. Because of this abstraction, we can conceptually shift containers up the stack, as long as they’re above layer 2. We’ll revisit containers in layer 6.

Examples: kernel + {systemd, cgroups/namespaces, jails, zones}

Level 3, Cluster Consensus

To begin to mitigate the complexity of managing individual servers, we need to start thinking about machines in some greater, collective sense: this is our first notion of a cluster. We want to write software that scales across these individual servers and shares work effortlessly.

However, as we add more servers to the picture, we now introduce many more points of failure: networks partition, machines crash and disks fail. How can we build systems in the face of greater uncertainty? What we’d like is some way of creating a uniform set of data and data primitives, as needed by distributed systems. Much like in multiprocessor programming, we need the equivalent of locks, message passing, shared memory and atomicity across this group of machines.

This is an interesting and vibrant field of algorithmic research: a first stop for the curious reader should be the works of Leslie Lamport, particularly his earlier writing on ordering and reliability of distributed systems. His later work describes Paxos, the preeminent consensus protocol; the other major protocol, as provided by many projects in this category, is Raft.

Why is this called consensus? The machines need to ‘agree’ on the same history and order of events in order to make the guarantees we’d like for the primitives described. Locks cannot be taken twice, for example, even if some subset of messages disappears or arrives out of order, or member machines crash for unknown reasons.

These algorithms build data structures to form a coherent, consistent, and fault-tolerant whole.

Examples: etcd, ZooKeeper, consul

Level 4, Cluster Resources

With this perspective of a unified cluster, we can now talk about cluster resources. Having abstracted the primitives of individual machines, we use this higher level view to create and interact with the complete set of resources that we have at our disposal. Thus we can consider in aggregate the CPUs, RAM, disk and networking as available to any process in the cluster, as provided by the physical layers underneath.

Viewing the cluster as one large machine, all devices (CPU, RAM, disk, networking) become abstract. This is a benefit already being used by containers. Containers depend on these things being abstracted on their behalf; for example, network bridges. This is so they can use these abstractions at a level higher in the stack while running on any of the underlying hardware.

In some sense, this layer is the equivalent of the hardware layer of the now-primordial notion of the cluster. It may not be as celebrated as the layers above it, but this layer is where some important innovation takes place. Showing a cool auto-scaling webapp demo is nice, but requires things like carving up cluster IP space or where a block device is attached to a host.

Examples: flannel, remote block storage, weave

Level 5, Cluster Orchestration and Scheduling

Cluster orchestration, then, starts to look a lot like an OS kernel atop these cluster-level resources and the tools given by consistency – symmetry with the layers below again. It’s the purview of the orchestration platform to divide and share cluster resources, schedule applications to run, manage permissions, set up interfaces into and out of the cluster, and at the end of the day, find an ABI-compatible environment for the userland. With increased scale comes new challenges: from finding the right machines to providing the best experience to users of the cluster.

Any software that will run on the cluster must ultimately execute on a physical CPU on a particular server. How the application code gets there and what abstractions it sees is controlled by the orchestration layer. This is similar to how WiFi simulates a copper wire to existing network stacks, with a controllable abstraction through access points, signal strength, meshes, encryption and more.

Examples: fleet, Mesos, Kubernetes

Level 6, Containers

This brings us back to containers, which, as described earlier, the entire userland is bundled together and treated as a single application unit.

If you’ve followed the whole stack up to this point, you’ll see why containers sit at level 6, instead of at level 2 or 3. It’s because the layers of abstraction below this point all depend on each other to build up to the point where a single-serving userland can safely abstract whether it’s running as one process on a local machine or as something scheduled on the cluster as a whole.

Containers are actually simple that way; they depend on everything else to provide the appropriate execution environment. They carry userland data and expect specific OS details to be presented to them.

Examples: Rocket, Docker, systemd-nspawn

Level 7, Application

Containers are currently getting a lot of attention in the industry because they can separate the OS and software dependencies from the hardware. By abstracting these details, we can create consistent execution environments across a fleet of machines and let the traditional POSIX userland continue to work, fairly seamlessly, no matter where you take it. If the intention is to share the containers, then choice is important, as is agreeing upon a sharable standard. Containers are exciting; it starts us down the road of a lot of open source work in the realm of true distributed systems, backwards-compatible with the code we already write – our Application.

Closing Thoughts

For any of the layers of the cluster, there are (and will continue to be) multiple implementations. Some will combine layers, some will break them into sub-pieces – but this was true of networking in the past as well (do you remember IPX? Or AppleTalk?).

As we continue to work deeply on the internals of every layer, we also sometimes want to take a step back to look at the overall picture and consider the greater audience of people who are interested and starting to work on clusters of their own. We want to introduce this concept as a guideline, with a symmetric way of thinking about a cluster and its components. We’d love your thoughts on what defines a cluster as more than a mass of hardware.