Cluster Management

Container management and deployment for your cluster using fleet

Easy Warehouse-scale Computing

With fleet, you can treat your CoreOS cluster as if it shared a single init system. It encourages users to write applications as small, ephemeral units that can easily migrate around a cluster of self-updating CoreOS machines.

By utilizing fleet, a devops team can focus on running containers that make up a service, without having to worry about the individual machines each container is running on. If your application consists of 5 containers, fleet will guarantee that they stay running somewhere on the cluster. If a machine fails or needs to be updated, containers running on that machine will be moved to other qualified machines in the cluster.

High availability is achieved by ensuring that service containers are not located on the same machine, availability zone or region. fleet also supports collocation with the same properties.

Complex architectures are possible by combining these properties. For example, preventing a database container from being deployed on the same machine as your distributed backups for that database is a very good idea.

Features

  • Deploy docker containers on arbitrary hosts in a cluster
  • Distribute services across a cluster using machine-level anti-affinity
  • Maintain N instances of a service, re-scheduling on machine failure
  • Discover machines running in the cluster
  • Automatically SSH into the machine running a job

Using the Client

Starting a Single Unit

Starting a unit on the cluster can be done by using fleetctl on a CoreOS machine. fleet can also be controlled remotely by configuring a tunnel to a machine in the cluster.

Running units can be inspected with fleetctl status <unit>. The output contains the current state of the unit and a the last few lines from the journal/log.

Starting a High Availability Service

The X-Fleet section of a unit file (docs) can define relationships between this unit and others running on the cluster. For example, let's spread out our MySQL containers so they're never on the same host with X-Conflicts=mysql*.service

When mysql.1.service, mysql.2.service and mysql.3.service are started, they will all land on different machines in the cluster.

If we SSH onto one of the machines and reboot it, the unit will move to another machine in the cluster.

Testing Machine Failure

If a machine fails, units will be rescheduled onto other qualified machines in the cluster. The easiest way to test this process is to reboot a machine.

Since our MySQL units conflict with each other, the rescheduled unit shouldn't land on a machine already running MySQL.

Technical Overview

You can think of fleet as an extension of systemd that operates at the cluster level instead of the machine level. Systemd is a single machine init system; fleet is a cluster init system.

fleet works by ingesting systemd unit files with a few extra attributes that control how your jobs are dispersed in the cluster. Most of the time your unit files will launch docker containers, but fleet supports all valid unit types such as .socket and .mount. If you’re unfamiliar with unit files, check out our getting Started with systemd guide.

fleet Unit Specifications

fleet contains two major entities: an engine and an agent. The engine is responsible for job scheduling & bidding and reacts to changes in cluster size. Scheduling logic is equally distributed between many fleet engines within the cluster.

The agent runs on each CoreOS machine and bids for jobs on behalf of the machine. Once a unit is assigned to the cluster, the agent starts the unit file and continually relays the state reported by systemd into fleet.

The etcd cluster is used for coordination between engines and agents. As a result of the fault-tolerance built into each piece of the system, fleet can automatically re-schedule jobs from failed machines onto other healthy, qualified machines in the cluster.

fleet Architecture

Job scheduling flow through fleet.