etcd restore operator


The etcd-restore-operator can restore an etcd cluster on Kubernetes from backup.

The overall workflow is:

  • Create the etcd-restore-operator
  • Create an EtcdRestore Custom Resource which triggers a restore request that specifies:
    • the etcd cluster spec
    • how to access the backup
  • The etcd-restore-operator will restore a new cluster from the backup
  • The etcd-operator takes over the management of the restored cluster

Note that currently the etcd-restore-operator only supports restoring from backups saved on S3.


Note: This demo uses the default namespace.

Simulate disaster failure

  1. Make sure example-etcd-cluster EtcdCluster CR exists

    kubectl get etcdcluster example-etcd-cluster
  2. Kill etcd pods to simulate disaster failure

    kubectl delete pod -l app=etcd,etcd_cluster=example-etcd-cluster --force --grace-period=0

Deploy the etcd-restore-operator

  1. Create a deployment of the etcd-restore-operator:

    kubectl create -f example/etcd-restore-operator/deployment.yaml
  2. Verify the following resources exist:

    $ kubectl get pods
    NAME                                     READY     STATUS    RESTARTS   AGE
    etcd-restore-operator-4203122180-npn3g   1/1       Running   0          7s
  3. Verify that the etcd-restore-operator creates the EtcdRestore CRD:

    $ kubectl get crd
    NAME                                       KIND

Setup AWS Secret

Create a Kubernetes secret that contains AWS credentials and config. This is used by the etcd-restore-operator to retrieve the backup from S3.

  1. Verify that the local aws config and credentials files exist:

    $ cat $AWS_DIR/credentials
    aws_access_key_id = XXX
    aws_secret_access_key = XXX
    $ cat $AWS_DIR/config
    region = <region>
  2. Create the secret aws:

    kubectl create secret generic aws --from-file=$AWS_DIR/credentials --from-file=$AWS_DIR/config

Create EtcdRestore CR

Create the EtcdRestore CR:

Note: This example uses k8s secret "aws" and S3 path "mybucket/etcd.backup"

sed -e 's|<full-s3-path>|mybucket/etcd.backup|g' \
    -e 's|<aws-secret>|aws|g' \
    example/etcd-restore-operator/restore_cr.yaml \
    | kubectl create -f -

Verify the CR status and restored cluster

  1. Check the status section of the EtcdRestore CR:

    $ kubectl get etcdrestore example-etcd-cluster -o yaml
    kind: EtcdRestore
      succeeded: true
  2. Verify the EtcdCluster CR for the restored cluster:

    $ kubectl get etcdcluster
    NAME                    KIND
  3. Verify that the etcd-operator scales the cluster to the desired size:

    $ kubectl get pods
    NAME                                     READY     STATUS    RESTARTS   AGE
    etcd-operator-2486363115-ltc17           1/1       Running   0          1h
    etcd-restore-operator-4203122180-npn3g   1/1       Running   0          30m
    example-etcd-cluster-795649v9kq          1/1       Running   1          3m
    example-etcd-cluster-jtp447ggnq          1/1       Running   1          4m
    example-etcd-cluster-psw7sf2hhr          1/1       Running   1          4m


Delete the etcd-restore-operator deployment and service, and the EtcdRestore CR.

Note: Deleting the EtcdRestore CR won't delete the EtcdCluster CR.

kubectl delete etcdrestore example-etcd-cluster
kubectl delete -f example/etcd-restore-operator/deployment.yaml