etcd restore operator


The etcd-restore-operator can restore an etcd cluster on Kubernetes from backup.

The overall workflow is:

  • Create the etcd-restore-operator
  • Create an EtcdRestore Custom Resource which triggers a restore request that specifies:
    • the etcd cluster spec
    • how to access the backup
  • The etcd-restore-operator will restore a new cluster from the backup
  • The etcd-operator takes over the management of the restored cluster

Note that currently the etcd-restore-operator only supports restoring from backups saved on S3.


Note: This demo uses the default namespace.

Simulate disaster failure

  1. Make sure example-etcd-cluster EtcdCluster CR exists

     kubectl get etcdcluster example-etcd-cluster
  2. Kill etcd pods to simulate disaster failure

     kubectl delete pod -l app=etcd,etcd_cluster=example-etcd-cluster --force --grace-period=0

Deploy the etcd-restore-operator

  1. Create a deployment of the etcd-restore-operator:

     kubectl create -f example/etcd-restore-operator/deployment.yaml
  2. Verify the following resources exist:

     $ kubectl get pods
     NAME                                     READY     STATUS    RESTARTS   AGE
     etcd-restore-operator-4203122180-npn3g   1/1       Running   0          7s
  3. Verify that the etcd-restore-operator creates the EtcdRestore CRD:

     $ kubectl get crd
     NAME                                       KIND

Setup AWS Secret

Create a Kubernetes secret that contains AWS credentials and config. This is used by the etcd-restore-operator to retrieve the backup from S3.

  1. Verify that the local aws config and credentials files exist:

     $ cat $AWS_DIR/credentials
     aws_access_key_id = XXX
     aws_secret_access_key = XXX
     $ cat $AWS_DIR/config
     region = <region>
  2. Create the secret aws:

     kubectl create secret generic aws --from-file=$AWS_DIR/credentials --from-file=$AWS_DIR/config

Create EtcdRestore CR

Create the EtcdRestore CR:

Note: This example uses k8s secret "aws" and S3 path "mybucket/etcd.backup"

sed -e 's|<full-s3-path>|mybucket/etcd.backup|g' \
    -e 's|<aws-secret>|aws|g' \
    example/etcd-restore-operator/restore_cr.yaml \
    | kubectl create -f -

Verify the CR status and restored cluster

  1. Check the status section of the EtcdRestore CR:

     $ kubectl get etcdrestore example-etcd-cluster -o yaml
     kind: EtcdRestore
       succeeded: true
  2. Verify the EtcdCluster CR for the restored cluster:

     $ kubectl get etcdcluster
     NAME                    KIND
  3. Verify that the etcd-operator scales the cluster to the desired size:

     $ kubectl get pods
     NAME                                     READY     STATUS    RESTARTS   AGE
     etcd-operator-2486363115-ltc17           1/1       Running   0          1h
     etcd-restore-operator-4203122180-npn3g   1/1       Running   0          30m
     example-etcd-cluster-0000                1/1       Running   0          8m
     example-etcd-cluster-0001                1/1       Running   0          8m
     example-etcd-cluster-0002                1/1       Running   0          8m


Delete the etcd-restore-operator deployment and service, and the EtcdRestore CR.

Note: Deleting the EtcdRestore CR won't delete the EtcdCluster CR.

kubectl delete etcdrestore example-etcd-cluster
kubect delete -f example/etcd-restore-operator/deployment.yaml