Make a backup of etcd running on cluster3-master1 and save it on the master node at /tmp/etcd-backup.db
.
Then create a Pod of your kind in the cluster.
Finally restore the backup, confirm the cluster is still working and that the created Pod is no longer with us.
Answer:
Etcd Backup
First we log into the master and try to create a snapshop of etcd:
➜ ssh cluster3-master1
➜ root@cluster3-master1:~# ETCDCTL_API=3 etcdctl snapshot save /tmp/etcd-backup.db
Error: rpc error: code = Unavailable desc = transport is closing
But it fails because we need to authenticate ourselves. For the necessary information we can check the etc manifest:
➜ root@cluster3-master1:~# vim /etc/kubernetes/manifests/etcd.yaml
We only check the etcd.yaml
for necessary information we don't change it.
# /etc/kubernetes/manifests/etcd.yaml
apiVersion v1
kind Pod
metadata
creationTimestamp null
labels
component etcd
tier control-plane
name etcd
namespace kube-system
spec
containers
command
etcd
--advertise-client-urls=https://192.168.100.31:2379
# use --cert-file=/etc/kubernetes/pki/etcd/server.crt
--client-cert-auth=true
--data-dir=/var/lib/etcd
--initial-advertise-peer-urls=https://192.168.100.31:2380
--initial-cluster=cluster3-master1=https://192.168.100.31:2380
# use --key-file=/etc/kubernetes/pki/etcd/server.key
# use --listen-client-urls=https://127.0.0.1:2379,https://192.168.100.31:2379
--listen-metrics-urls=http://127.0.0.1:2381
--listen-peer-urls=https://192.168.100.31:2380
--name=cluster3-master1
--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
--peer-client-cert-auth=true
--peer-key-file=/etc/kubernetes/pki/etcd/peer.key
# use --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
--snapshot-count=10000
--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
image k8s.gcr.io/etcd 3.3.15-0
imagePullPolicy IfNotPresent
livenessProbe
failureThreshold8
httpGet
host127.0.0.1
path /health
port2381
scheme HTTP
initialDelaySeconds15
timeoutSeconds15
name etcd
resources
volumeMounts
mountPath /var/lib/etcd
name etcd-data
mountPath /etc/kubernetes/pki/etcd
name etcd-certs
hostNetworktrue
priorityClassName system-cluster-critical
volumes
hostPath
path /etc/kubernetes/pki/etcd
type DirectoryOrCreate
name etcd-certs
hostPath
path /var/lib/etcd # important
type DirectoryOrCreate
name etcd-data
status
But we also know that the api-server is connecting to etcd, so we can check how its manifest is configured:
➜ root@cluster3-master1:~# cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep etcd
- --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
- --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
- --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
- --etcd-servers=https://127.0.0.1:2379
We use the authentication information and pass it to etcdctl:
➜ root@cluster3-master1:~# ETCDCTL_API=3 etcdctl snapshot save /tmp/etcd-backup.db \
--cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/server.crt \
--key /etc/kubernetes/pki/etcd/server.key
Snapshot saved at /tmp/etcd-backup.db
NOTE: Dont use
snapshot status
because it can alter the snapshot file and render it invalid
Etcd restore
Now create a Pod in the cluster and wait for it to be running:
➜ root@cluster3-master1:~# kubectl run test --image=nginx
pod/test created
➜ root@cluster3-master1:~# kubectl get pod -l run=test -w
NAME READY STATUS RESTARTS AGE
test 1/1 Running 0 60s
NOTE: If you didn't solve questions 18 or 20 and cluster3 doesn't have a ready worker node then the created pod might stay in a Pending state. This is still ok for this task.
Next we stop all controlplane components:
root@cluster3-master1:~# cd /etc/kubernetes/manifests/
root@cluster3-master1:/etc/kubernetes/manifests# mv * ..
root@cluster3-master1:/etc/kubernetes/manifests# watch crictl ps
Now we restore the snapshot into a specific directory:
➜ root@cluster3-master1:~# ETCDCTL_API=3 etcdctl snapshot restore /tmp/etcd-backup.db \
--data-dir /var/lib/etcd-backup \
--cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/server.crt \
--key /etc/kubernetes/pki/etcd/server.key
2020-09-04 16:50:19.650804 I | mvcc: restore compact to 9935
2020-09-04 16:50:19.659095 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
We could specify another host to make the backup from by using etcdctl --endpoints http://IP
, but here we just use the default value which is: http://127.0.0.1:2379,http://127.0.0.1:4001
.
The restored files are located at the new folder /var/lib/etcd-backup
, now we have to tell etcd to use that directory:
➜ root@cluster3-master1:~# vim /etc/kubernetes/etcd.yaml
# /etc/kubernetes/etcd.yaml
apiVersion v1
kind Pod
metadata
creationTimestamp null
labels
component etcd
tier control-plane
name etcd
namespace kube-system
spec
...
mountPath /etc/kubernetes/pki/etcd
name etcd-certs
hostNetworktrue
priorityClassName system-cluster-critical
volumes
hostPath
path /etc/kubernetes/pki/etcd
type DirectoryOrCreate
name etcd-certs
hostPath
path /var/lib/etcd-backup # change
type DirectoryOrCreate
name etcd-data
status
Now we move all controlplane yaml again into the manifest directory. Give it some time (up to several minutes) for etcd to restart and for the api-server to be reachable again:
root@cluster3-master1:/etc/kubernetes/manifests# mv ../*.yaml .
root@cluster3-master1:/etc/kubernetes/manifests# watch crictl ps
Then we check again for the Pod:
➜ root@cluster3-master1:~# kubectl get pod -l run=test
No resources found in default namespace.
Awesome, backup and restore worked as our pod is gone.
Comments
Post a Comment
https://gengwg.blogspot.com/