We recently announced Tectonic with Distributed Trusted Computing providing a secure infrastructure across every layer of the stack, from the application layer to the hardware. Today we outline how we enabled trusted computing and what it means for users of CoreOS.
Trusted computing refers to a set of technologies that allow a computer to demonstrate that it is trustworthy. The definition of trustworthy is not fixed – different people will have different ideas as to what is trustworthy, and the technology does nothing to enforce a particular idea. This policy-free mechanism is implemented with the aid of an additional hardware component on the system motherboard, a Trusted Platform Module (TPM).
TPMs are capable of certain operations that make trusted computing possible. They can generate cryptographic keys, and they can sign things with these keys. They can store a "measurement" of system state, and can communicate that measurement to a remote system through a cryptographically secure channel. And they are capable of encrypting small secrets and sealing them to a specific measurement state.
The trusted computing implementation in Tectonic and CoreOS takes advantage of all of these features to establish trust of individual CoreOS machines. Our implementation is “distributed” because of the ability to extend the chain of trust into the cluster, which creates an industry-first end-to-end trusted computing environment.
To begin with, we call out to the TPM and ask it to prove that it is an authentic TPM. It does this by providing the public half of the Endorsement Key (EK) and a certificate proving that the EK was generated by the TPM vendor at manufacturing time. The TPM then generates an Attestation Identity Key (AIK) and provides the public half. We then challenge the TPM by encrypting a secret with the EK and asking the TPM to verify that the AIK belongs to this specific TPM. If the TPM can provide both a decrypted copy of the secret and a signed statement that the AIK is under its control, we know that the AIK was generated by this specific TPM.
The AIK is important because it permits a trusted communication channel for "remote attestation," a mechanism that allows the TPM to provide a copy of its measurements to a remote system. Measurements are stored in a series of Platform Configuration Registers (PCRs). As the system boots, each component "measures" the next component by taking a SHA1 hash of the component and passing that to a specific PCR in the TPM. The TPM concatenates that hash with the existing value in the PCR, takes the SHA1 of the concatenated value and then replaces the existing PCR value with the new value. Without breaking SHA1, the only way to obtain a specific PCR value is to perform exactly the same series of writes as generated that value in the first place.
In a typical modern Intel system, this measurement process starts with the CPU measuring the first stage of the system firmware (this is part of Intel Boot Guard Technology) and writing this to the TPM. The first stage of the firmware then measures the second stage of the firmware, which in turn measures any additional firmware images such as ROMs on plug-in cards. The firmware measures the bootloader, the bootloader measures the kernel and ramdisk and (in typical trusted computing environments) the measurement stops there. In order to reach our goal to extend the chain of trust up to the cluster level, we have to enter uncharted territory.
This measurement process requires that the boot components be aware of the TPM and how to interact with it. Any systems that ship with TPMs should have firmware that implements this, so the firmware side is covered. However, most free software bootloaders have no TPM support. We've added support to the Shim UEFI first-stage loader and the GNU GRUB bootloader, and will be working with the upstream projects to integrate this functionality.
A remote system can request that the local system provide a copy of these measurements, signed with the AIK. The remote system can then examine these measurements and make a decision as to whether or not they match the desired policy. A typical policy may be to require that systems have specific firmware measurements (we want the servers to be running the latest firmware), a bootloader within a narrow range (we may permit the bootloader to be either the current or the previous version) and a kernel and ramdisk that match the bootloader. The remote system can then decide whether or not to grant access to a resource based on the measurement values. We are currently working to enhance upstream Kubernetes to allow it to refuse to grant a node access to the cluster unless its measurements meet the defined policy.
While measurement has typically only been used up to the end of the boot process, it can be taken further. rkt now has support to record the state of individual containers into the TPM. A tamper-proof log is kept of each of these measurements and can be provided to a remote system on request.
The remote system can then replay the log and ensure that it matches the PCR value. If the measurement is correct then we know that the log is an accurate representation of what was launched on the system. If the measurement is incorrect then we know that the log has been tampered with and can alert the cluster admin.
In the event of an attempted attack or successful exploit of a system, the audit log provides critical insight into what happened. Since the audit log can’t be modified, the attacker can’t clean up after themselves. This provides unprecedented transparency into firmware, bootloader, and OS attacks that previously went undetected.
The final component of trusted computing is the distribution of secrets. The TPM can be asked to generate encryption keys, which can then be certified in a similar way to the AIK. The TPM provides a copy of the public half of the encryption key, signed with the EK. Since we've already verified that the EK belongs to a trusted TPM, we can then verify that the encryption key also belongs to the TPM. The TPM will never release an unencrypted copy of the private half of the key, which means anything encrypted with the public half can only be decrypted by the TPM.
This means that we can distribute secrets by encrypting them in such a way that only the TPM can decrypt them, which makes it possible to solve the bootstrapping problem involved in secret distribution. If you buy a collection of servers and are provided with the EK certificates by the vendor or system integrator, you can then do the following:
No part of this process relies on having a trustworthy network connection or any initial configuration of the servers, which makes it ideally suited to the creation of distributed clusters. Servers can be delivered directly to a remote data center, racked and turned on and then become a secure part of your cluster without any additional manual intervention.
The bulk of this work (the measured boot patches for bootloaders, the Go bindings for libtspi and the associated TPM support code) will be straightforward to incorporate into upstream projects and other Linux distributions. We hope that this will make it easier for other projects to develop TPM-based features and explore exciting new avenues of improved user security.
The integration of these technologies into CoreOS allows Tectonic to provide an environment that enhances security and reduces admin workload. CoreOS is dedicated to improving the security of the Internet, and we believe that trusted computing is a meaningful step in that direction. Read more in the technical brief or try out the full solution in Tectonic Enterprise.