etcd is a distributed key value store that provides a reliable way to store data across a cluster of machines. It’s open-source and available on GitHub. etcd gracefully handles leader elections during network partitions and will tolerate machine failure, including the leader.
Your applications can read and write data into etcd. A simple use-case is to store database connection details or feature flags in etcd as key value pairs. These values can be watched, allowing your app to reconfigure itself when they change.
Advanced uses take advantage of the consistency guarantees to implement database leader elections or do distributed locking across a cluster of workers.
Read and write values with curl and other HTTP libraries
Store data in directories, similar to a file system
Watch a key or directory for changes and react to the new values
etcd is written in Go which has excellent cross-platform support, small binaries and a great community behind it. Communication between etcd machines is handled via the Raft consensus algorithm.
Latency from the etcd leader is the most important metric to track and the built-in dashboard has a view dedicated to this. In our testing, severe latency will introduce instability within the cluster because Raft is only as fast as the slowest machine in the majority. You can mitigate this issue by properly tuning the cluster. etcd has been pre-tuned on cloud providers with highly variable networks.
etcd should not be exposed outside of the CoreOS cluster. The recommended way to secure your entire cluster (and etcd) is to use a physical firewall, EC2 Security Groups or a similar feature to restrict all traffic unless allowed. Communication within the cluster can be secured with client certificates. Access control lists (ACLs) are supported to restrict ceratin users from reading or updating parts of the keyspace.
If you're running containers that are used for load balancing or caching, consider exposing only those containers instead of all containers.
Docker containers can read, write and listen to etcd over the docker0 network interface. With these three actions you construct extremely sophisticated orchestration to happen whenever etcd values change.
A common example of listening for changes is to reconfigure an upstream proxy when a new container of an application is started.
To keep service registration logic outside of your main codebase, "sidekick" units can be run next to the main systemd unit. Sidekicks will be scheduled by fleet onto the same machine as the main unit and will stop if the main unit stops for any reason.