The rise of Docker has been great thing for Linux containers and namespaces. Docker’s model of application building and deployment solves a lot of problems. However, one of the more common sticking points still being worked out is how dynamic, cross machine, container-to-container networking and service discovery will work.
We would like to propose a solution to this, called the "software defined localhost”.
Here is how it works, and it is pretty simple:
- Containers hardcode a localhost port in their configuration (127.0.0.1:3306, for mysql, for example)
- The container runtime presents a “jumper” (an open port) on the containers 127.0.0.1:3306, automatically and transparently connecting it to the configured remote container.
The net result is a system similar to synapse but on a container per container level.
We’re using the term “jumper” to refer to the tool that does this style of network namespace proxying.
For the sake of demonstration, we wrote a simple jumper called nspipe that will allow you to bind in one Linux network namespace, but do networking in another.
Terminal 1: Start a container and check for listening ports (currently none open)
$ docker run -t -i polvi/ubuntu-netstat /bin/bash root@b202508907b0:/# netstat -lntup Active Internet connections (only servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name root@b202508907b0:/#
Terminal 2: Start nspipe and attach a port on localhost:80 for b202508907b0
$ PID=$(./get-leader-pid b202508907b0) $ sudo ./nspipe -t $PID -l 127.0.0.1:80 -r www.google.com:80 PROXY: targetPid:8231 targetAddr:localhost:80 remoteAddr:www.google.com:80
Terminal 1: Check if port is listening and test it
root@b202508907b0:/# netstat -lntup Active Internet connections (only servers) tcp 0 0 127.0.0.1:80 0.0.0.0:* LISTEN - root@b202508907b0:/# curl localhost:80 <HTML><HEAD><meta http-equiv="content-type” content="text/html;charset=utf-8"> ...
Tada! We attached a port on localhost that forwards all traffic to google.com.
Note: ./get-leader-pid is a simple script that takes a docker container id and returns the first child process in the containers namespace. Using the setns system call requires directly poking the proc filesystem, and thus we need the pid.
A few more examples of this in action:
- An etcd powered jumper that round-robin load balances to any services that are registered with etcd: https://github.com/coreos/nsproxy
- A patched spiped (stunnel alternative) to securely encrypt all traffic that goes across the jumper, point to point. https://github.com/polvi/spiped
Finally, it's possible to setup the available networking before the container is started. In this example systemd-nspawn is a lightweight container manager that is bundled with systemd (and CoreOS), but it could be swapped out for the docker based lxc tools.
# create a fresh network namespace using iproute2 $ sudo ip netns add foo # This sets up the jumper as a daemonized service via systemd $ sudo systemd-run nspipe -p /var/run/netns/foo -l 127.0.0.1:80 -r www.google.com:80 $ sudo ip netns exec foo /usr/bin/systemd-nspawn -D busybox /bin/sh / # ifconfig -a lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:65536 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) / # netstat -l Active Internet connections (only servers) tcp 0 0 127.0.0.1:80 0.0.0.0:* LISTEN -
A number of benefits are unlocked with this approach:
- Networking is fully abstracted from the container, and always consistent (localhost always has what you need/specify). No dynamic IPs.
- Application configuration can be static (just hardcode localhost:3306 for your DB, for example).
- The jumper could transparently do things like dynamic service discovery, encrypting all traffic, starting other containers, without the container needing to implement anything.
- Could follow the ambassador pattern for chaining services together, or just directly connect point to point, load balance, etc, let the platform decide.
- The container could have no networking at all (only a loopback, no NAT), only given access to the services it needs (better security/isolation/consistency).
- Convert the proxy to a load balancer, and you can start doing sophisticated things like rolling migration to a new service without needing to reconfigure existing clients.
- Combined with systemd socket activation, the service receiving the connection on the other side of the proxy could be activated on demand. For example, you could bind 3306 into a container, and the mysql database container will provisioned on first connection.
- All traffic needs to be ran through a proxy. At high load, this probably will not work. This could be overcome by having your application talk directly to whatever service discovery tool you would want to use, instead of having the runtime do it for you.
- Implementing this requires directly fiddling with the kernel namespaces. This would have to be done natively in LXC to avoid races, or after a container is started with a helper tool similar in nature to pipework (but this would race at start-up.
We'd love to continue the discuss on (https://groups.google.com/forum/#!forum/coreos-dev)(https://groups.google.com/forum/#!forum/coreos-dev) and/or docker-user.