A security-minded, standards-based container engine

Overview

rkt is an application container engine developed for modern production cloud-native environments. It features a pod-native approach, a pluggable execution environment, and a well-defined surface area that makes it ideal for integration with other systems.

The core execution unit of rkt is the pod, a collection of one or more applications executing in a shared context (rkt's pods are synonymous with the concept in the Kubernetes orchestration system). rkt allows users to apply different configurations (like isolation parameters) at both pod-level and at the more granular per-application level. rkt's architecture means that each pod executes directly in the classic Unix process model (i.e. there is no central daemon), in a self-contained, isolated environment. rkt implements a modern, open, standard container format, the App Container (appc) spec, but can also execute other container images, like those created with Docker.

Since its introduction by CoreOS in December 2014, the rkt project has greatly matured and is widely used. It is available for most major Linux distributions and every rkt release builds self-contained rpm/deb packages that users can install. These packages are also available as part of the Kubernetes repository to enable testing of the rkt + Kubernetes integration. rkt also plays a central role in how Google Container Image and CoreOS Container Linux run Kubernetes.

rkt init system integration

Composable

Following the unix tools philosophy, rkt is a single binary that integrates with init systems, scripts, and complex devops pipelines. Containers take their correct place in the PID hierachy and can be managed with standard utilities.

customizable isolation in rkt

Customizable Isolation

Use containers as a standard, secure deployment object, and choose the appropriate level of isolation using rkt’s pluggable runtime architecture, known as stages.

rkt has pods built-in

Pods Built-In

The atomic unit in rkt is the pod, a group of related containers that share resources. This allows for easy stacking of related components, and maps directly to cluster management concepts.

Privileged

Specialized, trusted processes can run like a traditional chroot.

Read about rkt fly

Container/cgroup

Normal namespacing and cgroup isolation enforced by software above a shared kernel.

Design thinking of rkt

Virtual Machine

Full hardware virtualization for running certain high-performance or high-security workloads.

Running with LKVM
Benefit from standard packaging, signing and distribution at all isolation levels.
Industry leaders support the design philsophy of rkt

We find CoreOS’s rkt a compelling container engine in Kubernetes because of how rkt composes with underlying systemd.

The rkt runtime assumes only the responsibility it needs to, then delegates to other system services where appropriate. This separation of concerns is important to us.

Mark Petrovic
senior MTS and architect

We have been impressed by the stability and the flexibility of rkt even in very early versions.

We are migrating all our services to rkt and CoreOS. As of today, 90 percent of our product already runs on this platform.

Simon Lallemand
system engineer

A Security-minded Container Engine

Turn isolation up or down per container

Use KVM for VM-based isolation when required.

Integrated with SELinux

Support for SVirt in addition to a default SELinux policy

Enforces seccomp filtering on containers in pods

rkt leverages systemd seccomp features to strengthen container isolation by denying unsafe system calls and privilege escalation

Containers are signed & verified by default

Operations team can control granular trust permissions

Fetch containers as non-root user

A safer way to download app container images from the internet

Leverage the TPM for container security

Ensure only trusted containers run on your machines

The Container Engine for Dev and Ops

Dev: Use your existing Docker images

rkt can fetch, convert and execute Docker containers. Use your current registry, or Quay.io.

Ops: Practical Security Built-in

rkt treats practical security as a first principle of its design. It embodies everyday best practices and helps enforce them in your cluster.

  CoreOS rkt Docker
Runs Docker images Yes Yes
Image Signing Verifies signatures by default Client based; signature validation not enforced in Docker daemon
Privilege Separation Fetch, verify, validate signatures as unprivileged user All operations conducted by Docker daemon running as root
Composability Proper unix process model, manage processes with systemd, standard sysv init, runit, etc. Requires custom in-container init systems to manage child processes
Pluggable Isolation Multiple stage1 isolation environments, from chroot to cgroups to KVM - or roll your own Isolation only in terms of docker daemon options for network bridge or full privileged mode
Image Creation Container build tool based on shell scripting, leveraging familiar unix tools Build defined in Dockerfile, built by Docker daemon (as root)
Container Distribution Container images are plain tarballs, served over common HTTPS

DNS discovery of custom namespaces & signatures
Docker registry

Restrictive default namespace (docker.com)

Command Line Examples

rkt can discover, retrieve, verify, and store images without root privileges. This capability means that you're not downloading content from the internet as root.

Here is the command to fetch an Alpine Linux image:

core@core-01 ~ $ rkt fetch quay.io/coreos/alpine-sh
rkt: searching for app image quay.io/coreos/alpine-sh
rkt: remote fetching from URL "https://quay.io/c1/aci/quay.io/coreos/alpine-sh/latest/aci/linux/amd64/"
prefix: "quay.io/coreos/alpine-sh"
key: "https://quay.io/aci-signing-key"
gpg key fingerprint is: BFF3 13CD AA56 0B16 A898  7B8F 72AB F5F6 799D 33BC
  Quay.io ACI Converter (ACI conversion signing key) 
Key "https://quay.io/aci-signing-key" already in the keystore
rkt: downloading signature from https://quay.io/c1/aci/quay.io/coreos/alpine-sh/latest/aci.asc/linux/amd64/
Downloading signature: [=======================================] 473 B/473 B
Downloading ACI: [=============================================] 2.65 MB/2.65 MB
rkt: signature verified:
  Quay.io ACI Converter (ACI conversion signing key) 
sha512-a2fb8f390702d3d9b00d2ebd93e7dd1c
core@core-01 ~ $

After the image is located in the local image store, we can run it:

core@core-01 ~ $ sudo rkt run --interactive quay.io/coreos/alpine-sh -- /bin/sh
rkt: using image from file /usr/share/rkt/stage1-coreos.aci
rkt: using image from local store for image name quay.io/coreos/alpine-sh
/ # ps
PID   USER     TIME   COMMAND
    1 root       0:00 /usr/lib/systemd/systemd --default-standard-output=tty --
    2 root       0:00 /usr/lib/systemd/systemd-journald
    4 root       0:00 /bin/sh -c /bin/sh /bin/sh
    5 root       0:00 /bin/sh
    6 root       0:00 ps
/ # ls /
bin      etc      lib      media    proc     run      sys      usr
dev      home     linuxrc  mnt      root     sbin     tmp      var
/ # cat /etc/os-release
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.2.3
PRETTY_NAME="Alpine Linux v3.2"
HOME_URL="http://alpinelinux.org"
BUG_REPORT_URL="http://bugs.alpinelinux.org"
/ #

From inside the container, ps and ls show the isolated process namespace and container filesystem. The os-release file shows the container's OS personality.

Outside the container, the host systemd can monitor and arrange logging and other lifecycle management for rkt pods.

Printing system-wide status shows systemd managing the pod's execution as a system service, and the container's isolation within a cgroups machine slice, including the nested systemd governing process lifecycles inside the container:

core@core-02 ~ $ systemctl status
● core-02
    State: running
     Jobs: 0 queued
   Failed: 0 units
    Since: Tue 2016-02-02 19:35:01 UTC; 10h ago
   CGroup: /
           ├─1 /usr/lib/systemd/systemd --switched-root --system --deserialize 2
           ├─machine.slice
           │ └─machine-rkt\x2dcd642877\x2d8ef5\x2d4b0a\x2d8202\x2d2c6c9415b9cf.s
           │   ├─1857 /usr/lib/systemd/systemd --default-standard-output=tty --l
           │   └─system.slice
           │     ├─systemd-journald.service
           │     │ └─1860 /usr/lib/systemd/systemd-journald
           │     └─alpine-sh.service
           │       ├─1864 /bin/sh -c /bin/sh /bin/sh
           │       └─1866 /bin/sh
           └─system.slice
             ├─dbus.service
             │ └─643 /usr/bin/dbus-daemon --system --address=systemd: --nofork -
             ├─update-engine.service
             │ └─649 /usr/sbin/update_engine -foreground -logtostderr
             ├─system-sshd.slice
             │ ├─sshd@0-10.0.2.15:22-10.0.2.2:64676.service
             │ │ ├─ 819 sshd: core [priv]
core@core-02 ~ $

Since this is a view into the entire CoreOS machine, two default services, sshd, and the update-engine that handles CoreOS software updates, are visible as well. This illustrates the identical management interface for both system-level and containerized applications enabled by the rkt process model.

machinectl is systemd's tool for viewing VMs and containers running under its control. The command reveals the transient name assigned to our rkt instance, which is a container in this instance:

core@core-02 ~ $ machinectl list
MACHINE                                  CLASS     SERVICE
rkt-cd642877-8ef5-4b0a-8202-2c6c9415b9cf container nspawn

1 machines listed.

Given the pod's machine name, we can stop the pod with the machinectl tool:

core@core-02 ~ $ sudo machinectl poweroff rkt-cd642877-8ef5-4b0a-8202-2c6c9415b9cf
core@core-02 ~ $ machinectl list
MACHINE CLASS SERVICE

0 machines listed.
core@core-02 ~ $

Long-running rkt pods can be managed as systemd services, with standard tools and practices. An automatic nested systemd manages process lifecycles inside the container. Container apps can be inspected with familiar tools, and even integrated with local management scripts.

In this quick example, we're constructing an on-the-fly service with systemd-run. The status command lists the PID tree as well as the first few lines of logs.

In this example, running journalctl -u run-1907.service would yield the full log stream.

core@core-02 ~ $ sudo systemd-run --slice=machine rkt run --net=host quay.io/josh_wood/caddy
Running as unit run-1907.service.
core@core-02 ~ $ systemctl status run-1907.service
● rkt-caddy.service - /bin/rkt run --net=host quay.io/josh_wood/caddy
   Loaded: loaded (/run/systemd/system/run-1907.service; static; vendor preset: disabled)
  Drop-In: /run/systemd/system/run-1907.service.d
           └─50-Description.conf, 50-ExecStart.conf, 50-Slice.conf
   Active: active (running) since Wed 2016-02-03 06:37:48 UTC; 24s ago
 Main PID: 1908 (ld-linux-x86-64)
   CGroup: /machine.slice/run-1907.service
           ├─1908 stage1/rootfs/usr/lib/ld-linux-x86-64.so.2 stage1/rootfs/us...
           ├─1928 /usr/lib/systemd/systemd --default-standard-output=tty --lo...
           └─system.slice
             ├─caddy.service
             │ └─1933 /bin/caddy
             └─systemd-journald.service
               └─1929 /usr/lib/systemd/systemd-journald

Feb 03 06:37:50 core-02 rkt[1908]: Downloading signature:  473 B/473 B
Feb 03 06:37:51 core-02 rkt[1908]: Downloading ACI:  0 B/4.54 MB
Feb 03 06:37:51 core-02 rkt[1908]: Downloading ACI:  16.4 KB/4.54 MB
Feb 03 06:37:52 core-02 rkt[1908]: Downloading ACI:  819 KB/4.54 MB
Feb 03 06:37:53 core-02 rkt[1908]: Downloading ACI:  2.96 MB/4.54 MB
Feb 03 06:37:54 core-02 rkt[1908]: Downloading ACI:  4.54 MB/4.54 MB
Feb 03 06:37:54 core-02 rkt[1908]: rkt: signature verified:
Feb 03 06:37:54 core-02 rkt[1908]: Quay.io ACI Converter (ACI conversion si...o>
Feb 03 06:37:55 core-02 rkt[1908]: [38154.032938] caddy[4]: Activating priv...e.
Feb 03 06:37:55 core-02 rkt[1908]: [38154.035398] caddy[4]: :2015
Hint: Some lines were ellipsized, use -l to show in full.
core@core-02 ~ $ sudo systemctl stop run-1907.service

rkt can fetch Docker images from common Docker registries, and convert and execute them on the fly. To simplify this example, we direct rkt to skip image signature verification.

core@core-02 ~ $ sudo rkt run --insecure-options=image --interactive docker://busybox -- /bin/sh
rkt: using image from local store for image name coreos.com/rkt/stage1-coreos:0.16.0
rkt: remote fetching from URL "docker://busybox"
Downloading sha256:eeee0535bf3: [==============================] 676 KB/676 KB
Downloading sha256:a3ed95caeb0: [==============================] 32 B/32 B
/ # ps
PID   USER     TIME   COMMAND
    1 root       0:00 /usr/lib/systemd/systemd --default-standard-output=tty --
    2 root       0:00 /usr/lib/systemd/systemd-journald
    4 root       0:00 /bin/sh -c "sh" /bin/sh
    5 root       0:00 sh
    7 root       0:00 ps
/ # ls /
bin   dev   etc   home  proc  root  sys   tmp   usr   var
/ # uname -a
Linux rkt-a6470cfe-7b6c-498c-917d-a254a312f0aa 4.4.0-coreos-r2 #2 SMP Fri Jan 29 22:00:35 UTC 2016 x86_64 GNU/Linux
~ #

Running Docker images with rkt gains you better integration with your init system while preserving your existing build process.

Trying out all the features available in rkt can leave a lot of experimental pods lying around. The rkt gc command reaps exited pods and container images from the local store after a configurable grace period. This is easy to automate with a periodic schedule to keep the rkt store tidy:

core@core-02 ~ $ sudo rkt list
UUID    APP   IMAGE NAME          STATE NETWORKS
81627cc6  caddy   quay.io/josh_wood/caddy:latest      exited
cd642877  alpine-sh quay.io/coreos/alpine-sh:latest     exited
d65abad6  alpine-sh quay.io/coreos/alpine-sh:latest     exited
core@core-02 ~ $ sudo rkt gc
Moving pod "81627cc6-6d19-48db-8a29-d2e043d060f7" to garbage
Moving pod "cd642877-8ef5-4b0a-8202-2c6c9415b9cf" to garbage
Moving pod "d65abad6-2951-4c5a-a32d-c851145d3320" to garbage
Pod "81627cc6-6d19-48db-8a29-d2e043d060f7" not removed: still within grace period (30m0s)
Pod "cd642877-8ef5-4b0a-8202-2c6c9415b9cf" not removed: still within grace period (30m0s)
Pod "d65abad6-2951-4c5a-a32d-c851145d3320" not removed: still within grace period (30m0s)

Three containers have exited, with all of their processes terminated, but remain in the rkt management list until inactive longer than the garbage collection grace period — 30 minutes by default.

To remove these immediately, reduce the grace period to zero:

core@core-02 ~ $ sudo rkt gc --grace-period=0
Garbage collecting pod "81627cc6-6d19-48db-8a29-d2e043d060f7"
Garbage collecting pod "cd642877-8ef5-4b0a-8202-2c6c9415b9cf"
Garbage collecting pod "d65abad6-2951-4c5a-a32d-c851145d3320"
core@core-02 ~ $ sudo rkt list
UUID  APP IMAGE NAME  STATE NETWORKS
core@core-02 ~ $

All exited containers have been removed. Images can be garbage-collected after a configurable grace period in the same manner. Both storage and general system resource consumption are kept trim and tidy.

Familiar Tools for Building Container Images

The acbuild tool is a simple unix utility for constructing ACI manifests and container filesystems. acbuild presents options for mapping ports, mounting filesystems, and specifying the base containers FROM which higher-level images are built — a dep, or dependency, in acbuild parlance.

No Custom DSL

Rather than implementing its own DSL for container construction, acbuild leverages the command line environment to enable familiar shell scripting and even Makefile-driven build pipelines.

This example acbuild bash script constructs an Nginx webserver app container.

Distribute Over HTTPS

After a completed build, the container is ready to be served to users over HTTPS, without any specialized registry software.

Using DNS discovery, this container can be hosted on cloud storage, but referred to as coreos.com/nginx:latest across your infrastructure.

#!/usr/bin/env bash
set -e

if [ "$EUID" -ne 0 ]; then
    echo "This script uses functionality which requires root privileges"
    exit 1
fi

# Start the build with an empty ACI
acbuild --debug begin

# In the event of the script exiting, end the build
acbuildEnd() {
    export EXIT=$?
    acbuild --debug end && exit $EXIT
}
trap acbuildEnd EXIT

# Name the ACI
acbuild --debug set-name example.com/nginx

# Based on alpine
acbuild --debug dep add quay.io/coreos/alpine-sh

# Install nginx
acbuild --debug run apk update
acbuild --debug run apk add nginx

# Add a port for http traffic over port 80
acbuild --debug port add http tcp 80

# Add a mount point for files to serve
acbuild --debug mount add html /usr/share/nginx/html

# Run nginx in the foreground
acbuild --debug set-exec -- /usr/sbin/nginx -g "daemon off;"

# Save the ACI
acbuild --debug write --overwrite nginx-latest-linux-amd64.aci