个人模拟考试

# 个人模拟考试

这是我在 2023 年 1 月份做的一次 CKA 模拟考试，基于 kubernetes v1.26 版本。

这次模拟考试提供了 25 道题目（总分 125 分）、以及 2 道附加题目、3 道预习题目。（真实考试是 15~20 道题，总分 100 分）

模拟考试比真实的认证更难，如果你能够通过模拟考试，则说明你有了挑战真实考试的水平。

以下是模拟考试的内容：

# 考前设置

# 命令行

一旦你获得了对终端的访问权，花 1 分钟来设置你的环境可能是明智的。你可以设置这些：

alias k=kubectl                         # will already be pre-configured

export do="--dry-run=client -o yaml"    # k create deploy nginx --image=nginx $do

export now="--force --grace-period 0"   # k delete pod x $now

1
2
3
4
5

# vim

以下设置已在真实环境的~/.vimrc中进行配置。但是，输入这些内容永远不会有什么坏处：

set tabstop=2

set expandtab

set shiftwidth=2

1
2
3
4
5

更多的设置建议在提示部分。

# 模拟考题

# 考题-1 | 集群上下文

任务权重：1%

你可以通过kubectl从主终端访问多个集群上下文。将所有这些上下文名称写入/opt/course/1/contexts。

接下来，在/opt/course/1/context_default_kubectl.sh中写入一个命令以显示当前上下文，该命令应该使用kubectl。

最后，将执行相同操作的第二个命令写入/opt/course/1/context_default_no_kubectl.sh，但不使用kubectl。

# 解答

k config get-contexts # 手动复制

k config get-contexts -o name > /opt/course/1/contexts

1
2
3

然后，内容应该如下所示：

# /opt/course/1/contexts
k8s-c1-H
k8s-c2-AC
k8s-c3-CCC

1
2
3
4

接下来创建第一个命令：

# /opt/course/1/context_default_kubectl.sh
kubectl config current-context

1
2

➜ sh /opt/course/1/context_default_kubectl.sh
k8s-c1-H

1
2

第二个：

# /opt/course/1/context_default_no_kubectl.sh
cat ~/.kube/config | grep current | sed -e "s/current-context: //"

1
2

➜ sh /opt/course/1/context_default_no_kubectl.sh
k8s-c1-H

1
2

# 考题-2 | 调度控制平面节点上的Pod

任务权重：3%

切换集群：kubectl config use-context k8s-c1-H

在 Namespace default中创建镜像为httpd:2.4.41-alpine的单个 Pod。这个 Pod 应该被命名为pod1，容器应该被命名为pod1-container。这个 Pod 应该只调度在控制平面节点上，不要在任何节点上添加新标签。

# 解答

首先，我们找到控制平面节点及其污点：

k get node # 找到控制平面节点

k describe node cluster1-controlplane1 | grep Taint -A1 # 获取控制平面节点的污点

k get node cluster1-controlplane1 --show-labels # 获取控制平面节点的标签

1
2
3
4
5

接下来我们创建 Pod 模板：

# 检查本文档最顶部的建议，以便我们可以使用 $do
k run pod1 --image=httpd:2.4.41-alpine $do > 2.yaml

vim 2.yaml

1
2
3
4

手动执行必要的更改。使用 Kubernetes 文档并搜索 tolerations 和 nodeSelector 以查找示例：

# 2.yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: pod1
  name: pod1
spec:
  containers:
  - image: httpd:2.4.41-alpine
    name: pod1-container                       # change
    resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always
  tolerations:                                 # add
  - effect: NoSchedule                         # add
    key: node-role.kubernetes.io/control-plane # add
  nodeSelector:                                # add
    node-role.kubernetes.io/control-plane: ""  # add
status: {}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

这里很重要，要添加在控制平面节点上运行的容限，还要添加 nodeSelector，以确保它只在控制平面节点上运行。如果我们只指定一个容忍，则 Pod 会被调度在控制平面或工作节点上。

现在我们创建它：

k -f 2.yaml create

让我们检查一下 pod 是否已被调度：

➜ k get pod pod1 -o wide
NAME   READY   STATUS    RESTARTS   ...    NODE                     NOMINATED NODE
pod1   1/1     Running   0          ...    cluster1-controlplane1   <none>

1
2
3

# 考题-3 | 缩减 StatefulSet

任务权重：1%

切换集群：kubectl config use-context k8s-c1-H

在命名空间project-c13中有两个名为o3db-*的 Pod。现在 C13 管理层要求您将 Pod 缩减到一个副本以节省资源。

# 解答

找出 StatefulSet：

➜ k -n project-c13 get pod | grep o3db
o3db-0                                  1/1     Running   0          52s
o3db-1                                  1/1     Running   0          42s

➜ k -n project-c13 get deploy,ds,sts | grep o3db
statefulset.apps/o3db   2/2     2m56s

1
2
3
4
5
6

要完成任务，我们只需运行：

➜ k -n project-c13 scale sts o3db --replicas 1
statefulset.apps/o3db scaled

➜ k -n project-c13 get sts o3db
NAME   READY   AGE
o3db   1/1     4m39s

1
2
3
4
5
6

C13 管理层再次感到高兴。

# 考题-4 | Pod就绪（如果服务可达）

任务权重：4%

切换集群：kubectl config use-context k8s-c1-H

在命名空间default中执行以下操作。创建一个名为ready-if-service-ready镜像为nginx:1.16.1-alpine的 pod 。配置一个 LivenessProbe，它只执行命令true。还要配置一个 ReadinessProbe，它会检查 url http://service-am-i-ready:80是否可访问，为此可以使用wget -T2 -O- http://service-am-i-ready:80。启动 Pod 并确认它尚未准备就绪，因为 ReadinessProbe 的存在。

创建第二个名为am-i-ready的 pod，镜像为nginx:1.16.1-alpine，标签为id: cross-server-ready。已经存在的 Service service-am-i-ready现在应该将第二个 Pod 作为端点。

# 解答

一个 Pod 使用探测器检查另一个 Pod 是否准备好是一种反模式，因此通常可用的readinessProbe.httpGet不适用于绝对远程 url。尽管如此，此任务中请求的解决方法应显示探测器和 Pod<->Service 通信的工作原理。

首先，我们创建第一个 Pod：

k run ready-if-service-ready --image=nginx:1.16.1-alpine $do > 4_pod1.yaml

vim 4_pod1.yaml

1
2
3

接下来手动执行必要的添加：

# 4_pod1.yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: ready-if-service-ready
  name: ready-if-service-ready
spec:
  containers:
  - image: nginx:1.16.1-alpine
    name: ready-if-service-ready
    resources: {}
    livenessProbe:                                      # 从这里开始添加
      exec:
        command:
        - 'true'
    readinessProbe:
      exec:
        command:
        - sh
        - -c
        - 'wget -T2 -O- http://service-am-i-ready:80'   # 一直到这里
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

然后创建 Pod，并确认它处于非就绪状态：

k -f 4_pod1.yaml create

➜ k get pod ready-if-service-ready
NAME                     READY   STATUS    RESTARTS   AGE
ready-if-service-ready   0/1     Running   0          7s

1
2
3
4
5

现在我们创建第二个 Pod：

k run am-i-ready --image=nginx:1.16.1-alpine --labels="id=cross-server-ready"

已经存在的 Service service-am-i-ready现在应该会提供一个端点：

k describe svc service-am-i-ready
k get ep # 也是可能的

1
2

这将导致我们的第一个 Pod 准备就绪，只需给它一分钟时间让 Readiness 探测器再次检查：

➜ k get pod ready-if-service-ready
NAME                     READY   STATUS    RESTARTS   AGE
ready-if-service-ready   1/1     Running   0          53s

1
2
3

看看这些 Pod 的协同工作！

# 考题-5 | Kubectl排序

任务权重：1%

切换集群：kubectl config use-context k8s-c1-H

所有命名空间中都有各种 Pod。在/opt/course/5/find_pods.sh中写入一个命令，其中列出了按 AGE（metadata.creationTimestamp）排序的所有 Pod。

在/opt/course/5/find_pods_uid.sh中写入第二个命令，其中列出了按字段metadata.uid排序的所有 Pod。对这两个命令都使用kubectl排序。

# 解答

# /opt/course/5/find_pods.sh
kubectl get pod -A --sort-by=.metadata.creationTimestamp

1
2

对于第二个命令：

# /opt/course/5/find_pods_uid.sh
kubectl get pod -A --sort-by=.metadata.uid

1
2

# 考题-6 | 储存、PV、PVC、Pod挂载

任务权重：8%

切换集群：kubectl config use-context k8s-c1-H

创建一个名为safari-pv的新 PersistentVolume。它的容量应为 2Gi、访问模式为 ReadWriteOnce、hostPath 为/Volumes/Data，并且未定义 storageClassName。

接下来，在命名空间project-tiger中创建一个名为safari-pvc的新 PersistentVolumeClaim。它应该请求 2Gi 存储、访问模式 ReadWriteOnce，并且不应定义 storageClassName。PVC 应正确绑定到 PV。

最后，在命名空间project-tiger中创建一个名为safari的新 Deployment，它将该卷挂载到/tmp/safari-data。该 Deployment 的 Pod 镜像应为httpd:2.4.41-alpine。

# 解答

vim 6_pv.yaml

# 6_pv.yaml
kind: PersistentVolume
apiVersion: v1
metadata:
 name: safari-pv
spec:
 capacity:
  storage: 2Gi
 accessModes:
  - ReadWriteOnce
 hostPath:
  path: "/Volumes/Data"

1
2
3
4
5
6
7
8
9
10
11
12

然后创建它：

k -f 6_pv.yaml create

# 6_pvc.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: safari-pvc
  namespace: project-tiger
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
     storage: 2Gi

1
2
3
4
5
6
7
8
9
10
11
12

k -f 6_pvc.yaml create

并检查两者是否都具有 Bound 状态：

➜ k -n project-tiger get pv,pvc
NAME                         CAPACITY  ... STATUS   CLAIM                    ...
persistentvolume/safari-pv   2Gi       ... Bound    project-tiger/safari-pvc ...

NAME                               STATUS   VOLUME      CAPACITY ...
persistentvolumeclaim/safari-pvc   Bound    safari-pv   2Gi      ...

1
2
3
4
5
6

接下来，我们创建一个 Deployment 并挂载该卷：

k -n project-tiger create deploy safari \
  --image=httpd:2.4.41-alpine $do > 6_dep.yaml

vim 6_dep.yaml

1
2
3
4

# 6_dep.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  creationTimestamp: null
  labels:
    app: safari
  name: safari
  namespace: project-tiger
spec:
  replicas: 1
  selector:
    matchLabels:
      app: safari
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: safari
    spec:
      volumes:                                      # add
      - name: data                                  # add
        persistentVolumeClaim:                      # add
          claimName: safari-pvc                     # add
      containers:
      - image: httpd:2.4.41-alpine
        name: container
        volumeMounts:                               # add
        - name: data                                # add
          mountPath: /tmp/safari-data               # add

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

k -f 6_dep.yaml create

➜ k -n project-tiger describe pod safari-5cbf46d6d-mjhsb  | grep -A2 Mounts:   
    Mounts:
      /tmp/safari-data from data (rw) # there it is
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-n2sjj (ro)

1
2
3
4
5
6

# 考题-7 | Node和Pod资源使用情况

任务权重：1%

切换集群：kubectl config use-context k8s-c1-H

metrics-server 已安装在集群中。你的学院想知道 kubectl 命令：

显示节点资源使用情况
显示 Pod 及其容器资源使用情况

请将命令写入/opt/course/7/node.sh和/opt/course/7/pod.sh。

# 解答

# /opt/course/7/node.sh
kubectl top node

1
2

# /opt/course/7/pod.sh
kubectl top pod --containers=true

1
2

# 考题-8 | 获取控制平面信息

任务权重：2%

切换集群：kubectl config use-context k8s-c1-H

使用ssh cluster1-controlplane1通过 SSH 连接到控制平面节点。检查控制平面组件的 kubelet、kube-apiserver、kube-scheduler、kube-controller-manager 和 etcd 是如何在控制平面节点上启动 / 安装的。此外，找出 DNS 应用程序的名称以及它在控制平面节点上的启动 / 安装方式。

将您的发现写入文件/opt/course/8/controlplane-components.txt。该文件的结构应如下所示：

# /opt/course/8/controlplane-components.txt
kubelet: [TYPE]
kube-apiserver: [TYPE]
kube-scheduler: [TYPE]
kube-controller-manager: [TYPE]
etcd: [TYPE]
dns: [TYPE] [NAME]

1
2
3
4
5
6
7

[TYPE]选项包括：not-installed、process、static-pod、pod

# 解答

我们可以从查找请求组件的进程开始，尤其是 kubelet：

➜ ssh cluster1-controlplane1

root@cluster1-controlplane1:~# ps aux | grep kubelet # shows kubelet process

1
2
3

我们可以查看/etc/systemd/system目录，查看通过 systemd 控制哪些组件：

➜ root@cluster1-controlplane1:~# find /etc/systemd/system/ | grep kube
/etc/systemd/system/kubelet.service.d
/etc/systemd/system/kubelet.service.d/10-kubeadm.conf
/etc/systemd/system/multi-user.target.wants/kubelet.service

➜ root@cluster1-controlplane1:~# find /etc/systemd/system/ | grep etcd

1
2
3
4
5
6

这表明 kubelet 是通过 systemd 控制的，但没有其他名为 kube 或 etcd 的服务。看起来这个集群是使用 kubeadm 设置的，所以我们检查默认的 manifests 目录：

➜ root@cluster1-controlplane1:~# find /etc/kubernetes/manifests/
/etc/kubernetes/manifests/
/etc/kubernetes/manifests/kube-controller-manager.yaml
/etc/kubernetes/manifests/etcd.yaml
/etc/kubernetes/manifests/kube-apiserver.yaml
/etc/kubernetes/manifests/kube-scheduler.yaml

1
2
3
4
5
6

（kubelet 也可以通过其 systemd 启动配置中的参数--pod-manifest-path指定不同的清单目录）

这意味着主要的 4 个控制平面服务被设置为静态 Pod。实际上，让我们检查一下在控制平面节点上的kube-system命名空间中运行的所有 Pod：

➜ root@cluster1-controlplane1:~# kubectl -n kube-system get pod -o wide | grep controlplane1
coredns-5644d7b6d9-c4f68                            1/1     Running            ...   cluster1-controlplane1
coredns-5644d7b6d9-t84sc                            1/1     Running            ...   cluster1-controlplane1
etcd-cluster1-controlplane1                         1/1     Running            ...   cluster1-controlplane1
kube-apiserver-cluster1-controlplane1               1/1     Running            ...   cluster1-controlplane1
kube-controller-manager-cluster1-controlplane1      1/1     Running            ...   cluster1-controlplane1
kube-proxy-q955p                                    1/1     Running            ...   cluster1-controlplane1
kube-scheduler-cluster1-controlplane1               1/1     Running            ...   cluster1-controlplane1
weave-net-mwj47                                     2/2     Running            ...   cluster1-controlplane1

1
2
3
4
5
6
7
8
9

在那里，我们看到 5 个静态 pod，后缀为-cluster1-controlplane1。

我们也看到 dns 应用程序似乎是 coredns，但它是如何控制的呢？

➜ root@cluster1-controlplane1$ kubectl -n kube-system get ds
NAME         DESIRED   CURRENT   ...   NODE SELECTOR            AGE
kube-proxy   3         3         ...   kubernetes.io/os=linux   155m
weave-net    3         3         ...   <none>                   155m

➜ root@cluster1-controlplane1$ kubectl -n kube-system get deploy
NAME      READY   UP-TO-DATE   AVAILABLE   AGE
coredns   2/2     2            2           155m

1
2
3
4
5
6
7
8

coredns 似乎是通过 Deployment 控制的。我们将我们的发现合并到所需的文件中：

# /opt/course/8/controlplane-components.txt
kubelet: process
kube-apiserver: static-pod
kube-scheduler: static-pod
kube-controller-manager: static-pod
etcd: static-pod
dns: pod coredns

1
2
3
4
5
6
7

您应该能够轻松地调查正在运行的集群，了解如何设置集群及其部署服务的不同方法，并能够进行故障排除和查找错误源。

# 考题-9 | 杀死调度器，手动调度

任务权重：5%

切换集群：kubectl config use-context k8s-c2-AC

使用ssh cluster2-controlplane1通过 SSH 连接到控制平面节点。暂时停止 kube-scheduler，这意味着您可以在之后再次启动它。

创建一个名为manual-schedule且镜像为httpd:2.4-alpine的 Pod，确认它已创建但未在任何节点上调度。

现在假设你是调度器，并拥有它的所有功能，在节点 cluster2-controlplane1 上手动调度该 Pod。确保它正在运行。

再次启动 kube-scheduler 并创建镜像为httpd:2.4-alpine的第二个名为manual-schedule2的 Pod 来确认它运行正常，并检查它是否在 cluster2-node1 上运行。

# 解答

# 暂停调度器

首先，我们找到控制平面节点：

➜ k get node
NAME                     STATUS   ROLES           AGE   VERSION
cluster2-controlplane1   Ready    control-plane   26h   v1.26.0
cluster2-node1           Ready    <none>          26h   v1.26.0

1
2
3
4

然后我们连接并检查调度程序是否正在运行：

➜ ssh cluster2-controlplane1

➜ root@cluster2-controlplane1:~# kubectl -n kube-system get pod | grep schedule
kube-scheduler-cluster2-controlplane1            1/1     Running   0          6s

1
2
3
4

杀死调度器（暂时）：

➜ root@cluster2-controlplane1:~# cd /etc/kubernetes/manifests/

➜ root@cluster2-controlplane1:~# mv kube-scheduler.yaml ..

1
2
3

它应该已被停止：

➜ root@cluster2-controlplane1:~# kubectl -n kube-system get pod | grep schedule

➜ root@cluster2-controlplane1:~#

1
2
3

# 创建Pod

现在我们创建Pod：

k run manual-schedule --image=httpd:2.4-alpine

并确认它没有分配节点：

➜ k get pod manual-schedule -o wide
NAME              READY   STATUS    ...   NODE     NOMINATED NODE
manual-schedule   0/1     Pending   ...   <none>   <none>

1
2
3

# 手动调度Pod

让我们现在充当调度程序：

k get pod manual-schedule -o yaml > 9.yaml

# 9.yaml
...
spec:
  nodeName: cluster2-controlplane1        # 添加控制平面节点的名称
  containers:
  - image: httpd:2.4-alpine
    imagePullPolicy: IfNotPresent
    name: manual-schedule
    ...
...

1
2
3
4
5
6
7
8
9
10

调度程序唯一做的事情就是为 Pod 声明设置 nodeName。它如何找到正确的节点进行调度，这是一个非常复杂的问题，需要考虑许多变量。

由于我们不能kubectl apply或kubectl edit，在这种情况下，我们需要删除然后重新创建，或者使用替换：

k -f 9.yaml replace --force

看起来怎么样？

➜ k get pod manual-schedule -o wide
NAME              READY   STATUS    ...   NODE            
manual-schedule   1/1     Running   ...   cluster2-controlplane1

1
2
3

看起来我们的 Pod 现在正在按要求在控制平面上运行，尽管没有指定容忍度。只有调度程序在查找正确的节点名时才会考虑 tains/tolerations/affinity。这就是为什么仍然可以直接手动将 Pod 分配给控制平面节点并跳过调度程序。

# 再次启动调度程序

➜ ssh cluster2-controlplane1

➜ root@cluster2-controlplane1:~# cd /etc/kubernetes/manifests/

➜ root@cluster2-controlplane1:~# mv ../kube-scheduler.yaml .

1
2
3
4
5

检查它正在运行：

➜ root@cluster2-controlplane1:~# kubectl -n kube-system get pod | grep schedule
kube-scheduler-cluster2-controlplane1            1/1     Running   0          16s

1
2

调度第二个测试 Pod：

k run manual-schedule2 --image=httpd:2.4-alpine

➜ k get pod -o wide | grep schedule
manual-schedule    1/1     Running   ...   cluster2-controlplane1
manual-schedule2   1/1     Running   ...   cluster2-node1

1
2
3
4
5

恢复正常。

# 考题-10 | RBAC ServiceAccount Role RoleBinding

任务权重：6%

切换集群：kubectl config use-context k8s-c1-H

在命名空间project-hamster中创建新的 ServiceAccount processor。创建一个 Role 和 RoleBinding，这两个名称也都命名为processor。这些应该允许新的 SA 只在该命名空间中创建 Secret 和 ConfigMap。

# 解答

# 让我们稍微讨论一下RBAC资源

ClusterRole|Role 定义了一组权限及其可用范围，是在整个集群中还是仅在单个命名空间中。

ClusterRoleBinding|RoleBinding 将一组权限与帐户连接起来，并定义应用权限的位置，是在整个群集中还是仅在单个命名空间中。

因此，有 4 种不同的 RBAC 组合和 3 种有效的组合：

Role + RoleBinding（可在单个空间中使用，在单个命名空间中应用）
ClusterRole + ClusterRoleBinding（群集范围内可用，群集范围内应用）
ClusterRole + RoleBinding（群集范围内可用，应用于单个命名空间）
Role + ClusterRoleBinding（不可能：在单个命名空间中可用，在群集范围内应用）

# 前往解决方案

我们首先创建 ServiceAccount 和 Role：

➜ k -n project-hamster create sa processor
serviceaccount/processor created

k -n project-hamster create role processor \
  --verb=create \
  --resource=secret \
  --resource=configmap

1
2
3
4
5
6
7

这将创建一个角色，例如：

# kubectl -n project-hamster create role processor --verb=create --resource=secret --resource=configmap
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: processor
  namespace: project-hamster
rules:
- apiGroups:
  - ""
  resources:
  - secrets
  - configmaps
  verbs:
  - create

1
2
3
4
5
6
7
8
9
10
11
12
13
14

现在我们将 Role 绑定到 ServiceAccount：

k -n project-hamster create rolebinding processor \
  --role processor \
  --serviceaccount project-hamster:processor

1
2
3

这将创建一个 RoleBinding，如下所示：

# kubectl -n project-hamster create rolebinding processor --role processor --serviceaccount project-hamster:processor
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: processor
  namespace: project-hamster
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: processor
subjects:
- kind: ServiceAccount
  name: processor
  namespace: project-hamster

1
2
3
4
5
6
7
8
9
10
11
12
13
14

想要测试我们的 RBAC 设置，我们可以使用kubectl auth can-i：

➜ k -n project-hamster auth can-i create secret \
  --as system:serviceaccount:project-hamster:processor
yes

➜ k -n project-hamster auth can-i create configmap \
  --as system:serviceaccount:project-hamster:processor
yes

➜ k -n project-hamster auth can-i create pod \
  --as system:serviceaccount:project-hamster:processor
no

➜ k -n project-hamster auth can-i delete secret \
  --as system:serviceaccount:project-hamster:processor
no

➜ k -n project-hamster auth can-i get configmap \
  --as system:serviceaccount:project-hamster:processor
no

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

# 考题-11 | 所有节点上的DaemonSet

任务权重：4%

切换集群：kubectl config use-context k8s-c1-H

将命名空间project-tiger用于以下目的。创建一个名为ds-important的 DaemonSet，其镜像为httpd:2.4-alpine，标签为id=ds-important和uuid=18426a0b-5f59-4e10-923f-c0e078e82462。它创建的 Pod 应该请求 10 毫核 cpu 和 10 兆字节内存。该 DaemonSet 的 Pod 应该在所有节点上运行，包括控制平面。

# 解答

到目前为止，我们还不能直接使用kubectl创建 DaemonSet，因此我们创建了一个 Deployment 并对其进行更改：

k -n project-tiger create deployment --image=httpd:2.4-alpine ds-important $do > 11.yaml

vim 11.yaml

1
2
3

然后我们调整 yaml 为：

# 11.yaml
apiVersion: apps/v1
kind: DaemonSet                                     # change from Deployment to Daemonset
metadata:
  creationTimestamp: null
  labels:                                           # add
    id: ds-important                                # add
    uuid: 18426a0b-5f59-4e10-923f-c0e078e82462      # add
  name: ds-important
  namespace: project-tiger                          # important
spec:
  #replicas: 1                                      # remove
  selector:
    matchLabels:
      id: ds-important                              # add
      uuid: 18426a0b-5f59-4e10-923f-c0e078e82462    # add
  #strategy: {}                                     # remove
  template:
    metadata:
      creationTimestamp: null
      labels:
        id: ds-important                            # add
        uuid: 18426a0b-5f59-4e10-923f-c0e078e82462  # add
    spec:
      containers:
      - image: httpd:2.4-alpine
        name: ds-important
        resources:
          requests:                                 # add
            cpu: 10m                                # add
            memory: 10Mi                            # add
      tolerations:                                  # add
      - effect: NoSchedule                          # add
        key: node-role.kubernetes.io/control-plane  # add
#status: {}                                         # remove

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

要求 DaemonSet 在所有节点上运行，因此我们需要为此指定容忍度。

让我们确认一下：

k -f 11.yaml create

➜ k -n project-tiger get ds
NAME           DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
ds-important   3         3         3       3            3           <none>          8s

➜ k -n project-tiger get pod -l id=ds-important -o wide
NAME                      READY   STATUS          NODE
ds-important-6pvgm        1/1     Running   ...   cluster1-node1
ds-important-lh5ts        1/1     Running   ...   cluster1-controlplane1
ds-important-qhjcq        1/1     Running   ...   cluster1-node2

1
2
3
4
5
6
7
8
9
10
11

# 考题-12 | 所有节点上的Deployment

任务权重：6%

切换集群：kubectl config use-context k8s-c1-H

将命名空间project-tiger用于以下目的。创建一个名为deploy-important的 Deployment，标签为id=very-important（Pods也应该有这个标签）和 3 个副本。它应该包含两个容器，第一个名为container1，镜像为nginx:1.17.6-alpine，第二个名为container2，镜像为kubernetes/pause。

该 Deployment 的单个 Pod 应该只在一个工作节点上运行。我们有两个工作节点：cluster1-node1和cluster1-node2。因为 Deployment 有三个副本，所以结果应该是在两个节点上都运行着一个 Pod。除非添加新的工作节点，否则不会调度第三个 Pod。

在某种程度上，我们在这里模拟了 DaemonSet 的行为，但使用了 Deployment 和固定数量的副本。

# 解答

有两种可能的方法，一种使用podAntiAffinity，另一种使用topologySpreadConstraint。

# PodAntiAffinity

这里的想法是，我们创建了一个 “Pod 之间的反亲和性” ，它允许我们让一个 Pod 应该只调度在一个节点上，而另一个特定标签（这里是同一个标签）的 Pod 还没有运行。

让我们从创建 Deployment 模板开始开始：

k -n project-tiger create deployment \
  --image=nginx:1.17.6-alpine deploy-important $do > 12.yaml

vim 12.yaml

1
2
3
4

# 12.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  creationTimestamp: null
  labels:
    id: very-important                  # change
  name: deploy-important
  namespace: project-tiger              # important
spec:
  replicas: 3                           # change
  selector:
    matchLabels:
      id: very-important                # change
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        id: very-important              # change
    spec:
      containers:
      - image: nginx:1.17.6-alpine
        name: container1                # change
        resources: {}
      - image: kubernetes/pause         # add
        name: container2                # add
      affinity:                                             # add
        podAntiAffinity:                                    # add
          requiredDuringSchedulingIgnoredDuringExecution:   # add
          - labelSelector:                                  # add
              matchExpressions:                             # add
              - key: id                                     # add
                operator: In                                # add
                values:                                     # add
                - very-important                            # add
            topologyKey: kubernetes.io/hostname             # add
status: {}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

指定一个 topologyKey，这是一个预填充的 Kubernetes 标签，你可以通过描述节点来找到它。

# TopologySpreadConstraints

我们可以用topologySpreadConstraints实现同样的目标。最好同时尝试一下两者。

# 12.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  creationTimestamp: null
  labels:
    id: very-important                  # change
  name: deploy-important
  namespace: project-tiger              # important
spec:
  replicas: 3                           # change
  selector:
    matchLabels:
      id: very-important                # change
  strategy: {}
  template:
    metadata:
      creationTimestamp: null
      labels:
        id: very-important              # change
    spec:
      containers:
      - image: nginx:1.17.6-alpine
        name: container1                # change
        resources: {}
      - image: kubernetes/pause         # add
        name: container2                # add
      topologySpreadConstraints:                 # add
      - maxSkew: 1                               # add
        topologyKey: kubernetes.io/hostname      # add
        whenUnsatisfiable: DoNotSchedule         # add
        labelSelector:                           # add
          matchLabels:                           # add
            id: very-important                   # add
status: {}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

# 应用并运行

让我们运行它，然后我们检查 Deployment 状态，其中显示 2/3 就绪计数：

k -f 12.yaml create

➜ k -n project-tiger get deploy -l id=very-important
NAME               READY   UP-TO-DATE   AVAILABLE   AGE
deploy-important   2/3     3            2           2m35s

➜ k -n project-tiger get pod -o wide -l id=very-important
NAME                                READY   STATUS    ...   NODE             
deploy-important-58db9db6fc-9ljpw   2/2     Running   ...   cluster1-node1
deploy-important-58db9db6fc-lnxdb   0/2     Pending   ...   <none>          
deploy-important-58db9db6fc-p2rz8   2/2     Running   ...   cluster1-node2

1
2
3
4
5
6
7
8
9
10
11

如果我们 kubectl 描述 Pod deploy-important-58db9db6fc-lnxdb，它将向我们展示不调度的原因是我们实现的podAntiAffinity规则：

Warning  FailedScheduling  63s (x3 over 65s)  default-scheduler  0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/control-plane: }, that the pod didn't tolerate, 2 node(s) didn't match pod affinity/anti-affinity, 2 node(s) didn't satisfy existing pods anti-affinity rules.

或者我们的拓扑约束：

Warning  FailedScheduling  16s   default-scheduler  0/3 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/control-plane: }, that the pod didn't tolerate, 2 node(s) didn't match pod topology spread constraints.

# 考题-13 | 多容器 Pod 共享卷

任务权重：4%

切换集群：kubectl config use-context k8s-c1-H

在命名空间default中创建一个名为multi-container-playground的 Pod，其中包含三个容器，分别名为c1、c2和c3。应该有一个卷附加到该 Pod 并挂载到每个容器中，但该卷不应持久化或与其他 Pod 共享。

容器c1的镜像应为nginx:1.17.6-alpine，并且运行该 Pod 的节点名称可作为环境变量MY_NODE_NAME。

容器c2的镜像应为busybox:1.31.1，并且每秒将date命令的输出写入共享卷中的date.log文件。您可以使用while true; do date >> /your/vol/path/date.log; sleep 1; done来完成。

容器c3的镜像应为busybox:1.31.1，并不断将文件date.log的内容从共享卷发送到标准输出。为此，您可以使用tail -f /your/vol/path/date.log来完成。

检查容器c3的日志以确认设置正确。

# 解答

首先我们创建 Pod 模板：

k run multi-container-playground --image=nginx:1.17.6-alpine $do > 13.yaml

vim 13.yaml

1
2
3

# 13.yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: multi-container-playground
  name: multi-container-playground
spec:
  containers:
  - image: nginx:1.17.6-alpine
    name: c1                                                                      # change
    resources: {}
    env:                                                                          # add
    - name: MY_NODE_NAME                                                          # add
      valueFrom:                                                                  # add
        fieldRef:                                                                 # add
          fieldPath: spec.nodeName                                                # add
    volumeMounts:                                                                 # add
    - name: vol                                                                   # add
      mountPath: /vol                                                             # add
  - image: busybox:1.31.1                                                         # add
    name: c2                                                                      # add
    command: ["sh", "-c", "while true; do date >> /vol/date.log; sleep 1; done"]  # add
    volumeMounts:                                                                 # add
    - name: vol                                                                   # add
      mountPath: /vol                                                             # add
  - image: busybox:1.31.1                                                         # add
    name: c3                                                                      # add
    command: ["sh", "-c", "tail -f /vol/date.log"]                                # add
    volumeMounts:                                                                 # add
    - name: vol                                                                   # add
      mountPath: /vol                                                             # add
  dnsPolicy: ClusterFirst
  restartPolicy: Always
  volumes:                                                                        # add
    - name: vol                                                                   # add
      emptyDir: {}                                                                # add
status: {}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

哦，天哪，很多要求的东西。我们检查 Pod 是否一切正常：

k -f 13.yaml create

➜ k get pod multi-container-playground
NAME                         READY   STATUS    RESTARTS   AGE
multi-container-playground   3/3     Running   0          95s

1
2
3
4
5

很好，然后我们检查容器 c1 是否将请求的节点名称作为 env 变量：

➜ k exec multi-container-playground -c c1 -- env | grep MY
MY_NODE_NAME=cluster1-node2

1
2

最后，我们检查日志：

➜ k logs multi-container-playground -c c3
Sat Dec  7 16:05:10 UTC 2077
Sat Dec  7 16:05:11 UTC 2077
Sat Dec  7 16:05:12 UTC 2077
Sat Dec  7 16:05:13 UTC 2077
Sat Dec  7 16:05:14 UTC 2077
Sat Dec  7 16:05:15 UTC 2077
Sat Dec  7 16:05:16 UTC 2077

1
2
3
4
5
6
7
8

# 考题-14 | 了解群集信息

任务权重：2%

切换集群：kubectl config use-context k8s-c1-H

系统会要求您了解有关集群k8s-c1-H的以下信息：

有多少控制平面节点可用？
有多少工作节点可用？
服务的 CIDR（网段）是多少？
配置了哪个网络（或 CNI 插件），其配置文件在哪里？
在 cluster1-node1 上运行的静态 pod 会有哪个后缀？

将答案写入文件/opt/course/14/cluster-info，结构如下：

# /opt/course/14/cluster-info
1: [ANSWER]
2: [ANSWER]
3: [ANSWER]
4: [ANSWER]
5: [ANSWER]

1
2
3
4
5
6

# 解答

# 有多少控制平面和工作节点可用？

➜ k get node
NAME                    STATUS   ROLES          AGE   VERSION
cluster1-controlplane1  Ready    control-plane  27h   v1.26.0
cluster1-node1          Ready    <none>         27h   v1.26.0
cluster1-node2          Ready    <none>         27h   v1.26.0

1
2
3
4
5

我们看到一个控制平面和两个 worker。

# 服务的 CIDR（网段）是多少？

➜ ssh cluster1-controlplane1

➜ root@cluster1-controlplane1:~# cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep range
    - --service-cluster-ip-range=10.96.0.0/12

1
2
3
4

# 配置了哪个网络（或 CNI 插件），其配置文件在哪里？

➜ root@cluster1-controlplane1:~# find /etc/cni/net.d/
/etc/cni/net.d/
/etc/cni/net.d/10-weave.conflist

➜ root@cluster1-controlplane1:~# cat /etc/cni/net.d/10-weave.conflist
{
    "cniVersion": "0.3.0",
    "name": "weave",
...

1
2
3
4
5
6
7
8
9

默认情况下，kubelet 会检测/etc/cni/net.d以发现 CNI 插件。这在每个控制平面和工作器节点上都是相同的。

# 在 cluster1-node1 上运行的静态 pod 会有哪个后缀？

后缀是带有前导连字符的节点主机名。在早期的 Kubernetes 版本中，它曾经是-static。

# 结果

生成的/opt/course/14/cluster-info可能如下所示：

# /opt/course/14/cluster-info

# How many controlplane nodes are available?
1: 1

# How many worker nodes are available?
2: 2

# What is the Service CIDR?
3: 10.96.0.0/12

# Which Networking (or CNI Plugin) is configured and where is its config file?
4: Weave, /etc/cni/net.d/10-weave.conflist

# Which suffix will static pods have that run on cluster1-node1?
5: -cluster1-node1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

# 考题-15 | 群集事件日志记录

任务权重：3%

切换集群：kubectl config use-context k8s-c2-AC

在/opt/course/15/cluster_events.sh中写入一个命令，该命令显示整个集群中按时间排序的最新事件（metadata.creationTimestamp）。使用kubectl来获取它。

现在终止在节点 cluster2-node1 上运行的 Pod kube-proxy，并将这导致的事件写入/opt/course/15/pod_kill.log。

最后，在节点 cluster2-node1 上杀死 Pod kube-proxy的 containerd 容器，并将事件写入/opt/course/15/container_kill.log。

你是否注意到这两种行为引起的事件的不同之处？

# 解答

# /opt/course/15/cluster_events.sh
kubectl get events -A --sort-by=.metadata.creationTimestamp

1
2

现在我们杀死 kube-proxy Pod：

k -n kube-system get pod -o wide | grep proxy # find pod running on cluster2-node1

k -n kube-system delete pod kube-proxy-z64cg

1
2
3

现在检查事件：

sh /opt/course/15/cluster_events.sh

将停止引起的事件写入/opt/course/15/pod_kill.log：

# /opt/course/15/pod_kill.log
kube-system   9s          Normal    Killing           pod/kube-proxy-jsv7t   ...
kube-system   3s          Normal    SuccessfulCreate  daemonset/kube-proxy   ...
kube-system   <unknown>   Normal    Scheduled         pod/kube-proxy-m52sx   ...
default       2s          Normal    Starting          node/cluster2-node1  ...
kube-system   2s          Normal    Created           pod/kube-proxy-m52sx   ...
kube-system   2s          Normal    Pulled            pod/kube-proxy-m52sx   ...
kube-system   2s          Normal    Started           pod/kube-proxy-m52sx   ...

1
2
3
4
5
6
7
8

最后，我们将尝试通过杀死属于 kube-proxy Pod 的容器来引发事件：

➜ ssh cluster2-node1

➜ root@cluster2-node1:~# crictl ps | grep kube-proxy
1e020b43c4423   36c4ebbc9d979   About an hour ago   Running   kube-proxy     ...

➜ root@cluster2-node1:~# crictl rm 1e020b43c4423
1e020b43c4423

➜ root@cluster2-node1:~# crictl ps | grep kube-proxy
0ae4245707910   36c4ebbc9d979   17 seconds ago      Running   kube-proxy     ...

1
2
3
4
5
6
7
8
9
10

我们终止了主容器（1e020b43c4423），但也注意到直接创建了一个新容器（0ae4245707910）。谢谢 Kubernetes！

现在我们看看这是否再次引起了事件，并将其写入第二个文件：

sh /opt/course/15/cluster_events.sh

# /opt/course/15/container_kill.log
kube-system   13s         Normal    Created      pod/kube-proxy-m52sx    ...
kube-system   13s         Normal    Pulled       pod/kube-proxy-m52sx    ...
kube-system   13s         Normal    Started      pod/kube-proxy-m52sx    ...

1
2
3
4

比较这些事件，我们看到，当我们删除整个 Pod 时，有更多的事情要做，因此有更多的事件。例如，游戏中的 DaemonSet 用于重新创建丢失的 Pod。当我们手动杀死 Pod 的主容器时，Pod 仍然存在，但只需要重新创建它的容器，因此事件更少。

# 考题-16 | 命名空间和API资源

任务权重：2%

切换集群：kubectl config use-context k8s-c1-H

将所有命名空间的 Kubernetes 资源（如 Pod、Secret、ConfigMap 等）的名称写入/opt/course/16/resources.txt。

找到定义了最多 Role 数量的project-*命名空间，并将其名称和角色数量写入/opt/course/16/crowded-namespace.txt。

# 解答

# 命名空间和资源

现在我们可以得到所有资源的列表，如：

k api-resources    # shows all

k api-resources -h # help always good

k api-resources --namespaced -o name > /opt/course/16/resources.txt

1
2
3
4
5

这将导致文件：

# /opt/course/16/resources.txt
bindings
configmaps
endpoints
events
limitranges
persistentvolumeclaims
pods
podtemplates
replicationcontrollers
resourcequotas
secrets
serviceaccounts
services
controllerrevisions.apps
daemonsets.apps
deployments.apps
replicasets.apps
statefulsets.apps
localsubjectaccessreviews.authorization.k8s.io
horizontalpodautoscalers.autoscaling
cronjobs.batch
jobs.batch
leases.coordination.k8s.io
events.events.k8s.io
ingresses.extensions
ingresses.networking.k8s.io
networkpolicies.networking.k8s.io
poddisruptionbudgets.policy
rolebindings.rbac.authorization.k8s.io
roles.rbac.authorization.k8s.io

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

# 最多Role的命名空间

➜ k -n project-c13 get role --no-headers | wc -l
No resources found in project-c13 namespace.
0

➜ k -n project-c14 get role --no-headers | wc -l
300

➜ k -n project-hamster get role --no-headers | wc -l
No resources found in project-hamster namespace.
0

➜ k -n project-snake get role --no-headers | wc -l
No resources found in project-snake namespace.
0

➜ k -n project-tiger get role --no-headers | wc -l
No resources found in project-tiger namespace.
0

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

最后，我们将名称和数量写入文件：

# /opt/course/16/crowded-namespace.txt
project-c14 with 300 resources

1
2

# 考题-17 | 找到Pod的容器并检查信息

任务权重：3%

切换集群：kubectl config use-context k8s-c1-H

在命名空间project-tiger中，创建一个名为tigers-reunite的 Pod，其镜像为httpd:2.4.41-alpine，标签为pod=container和container=pod。找出 Pod 被调度在哪个节点上。通过 SSH 连接到该节点并找到属于该 Pod 的 containerd 容器。

使用命令crictl：

将容器的 ID 和info.runtimeType写入/opt/course/17/pod-container.txt
将容器的日志写入/opt/course/17/pod-container.log

# 解答

首先，我们创建 Pod：

k -n project-tiger run tigers-reunite \
  --image=httpd:2.4.41-alpine \
  --labels "pod=container,container=pod"

1
2
3

接下来，我们找出它被调度的节点：

k -n project-tiger get pod -o wide

# or fancy:
k -n project-tiger get pod tigers-reunite -o jsonpath="{.spec.nodeName}"

1
2
3
4

然后我们通过 ssh 连接该节点并检查容器信息：

➜ ssh cluster1-node2

➜ root@cluster1-node2:~# crictl ps | grep tigers-reunite
b01edbe6f89ed    54b0995a63052    5 seconds ago    Running        tigers-reunite ...

➜ root@cluster1-node2:~# crictl inspect b01edbe6f89ed | grep runtimeType
    "runtimeType": "io.containerd.runc.v2",

1
2
3
4
5
6
7

然后我们填写请求的文件（在主终端上）：

# /opt/course/17/pod-container.txt
b01edbe6f89ed io.containerd.runc.v2

1
2

最后，我们在第二个文件中写入容器日志：

ssh cluster1-node2 'crictl logs b01edbe6f89ed' &> /opt/course/17/pod-container.log

上述命令中的&>会重定向标准输出和标准错误。

您也可以简单地在节点上运行crictl logs并手动复制内容（如果不是很多的话）。该文件应如下所示：

# /opt/course/17/pod-container.log
AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 10.44.0.37. Set the 'ServerName' directive globally to suppress this message
AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using 10.44.0.37. Set the 'ServerName' directive globally to suppress this message
[Mon Sep 13 13:32:18.555280 2021] [mpm_event:notice] [pid 1:tid 139929534545224] AH00489: Apache/2.4.41 (Unix) configured -- resuming normal operations
[Mon Sep 13 13:32:18.555610 2021] [core:notice] [pid 1:tid 139929534545224] AH00094: Command line: 'httpd -D FOREGROUND'

1
2
3
4
5

# 考题-18 | 修复Kubelet

任务权重：8%

切换集群：kubectl config use-context k8s-c3-CCC

kubelet 似乎存在问题，无法在cluster3-node1上运行。修复它，然后确认群集的节点cluster3-node1处于 “Ready” 状态。之后，您应该能够在cluster3-node1上调度一个 Pod。

将问题的原因写入/opt/course/18/reason.txt。

# 解答

像这样的任务的过程应该是检查 kubelet 是否正在运行，如果没有则启动它，然后检查它的日志并纠正错误（如果有的话）。

检查其他集群是否已经定义并运行了某些组件总是很有帮助的，这样您就可以复制和使用现有的配置文件。但在这种情况下，它可能不需要是必要的。

检查节点状态：

➜ k get node
NAME                     STATUS     ROLES           AGE   VERSION
cluster3-controlplane1   Ready      control-plane   14d   v1.26.0
cluster3-node1           NotReady   <none>          14d   v1.26.0

1
2
3
4

首先，我们检查 kubelet 是否正在运行：

➜ ssh cluster3-node1

➜ root@cluster3-node1:~# ps aux | grep kubelet
root     29294  0.0  0.2  14856  1016 pts/0    S+   11:30   0:00 grep --color=auto kubelet

1
2
3
4

没有，所以我们检查它是否使用 systemd 作为服务配置：

➜ root@cluster3-node1:~# service kubelet status
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: inactive (dead) since Sun 2019-12-08 11:30:06 UTC; 50min 52s ago
...

1
2
3
4
5
6
7

是的，它在/etc/systemd/system/kubelet.service.d/10-kubeadm.conf中被配置为服务，但我们看到它处于非活动状态。让我们试着开始它：

➜ root@cluster3-node1:~# service kubelet start

➜ root@cluster3-node1:~# service kubelet status
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: activating (auto-restart) (Result: exit-code) since Thu 2020-04-30 22:03:10 UTC; 3s ago
     Docs: https://kubernetes.io/docs/home/
  Process: 5989 ExecStart=/usr/local/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=203/EXEC)
 Main PID: 5989 (code=exited, status=203/EXEC)

Apr 30 22:03:10 cluster3-node1 systemd[5989]: kubelet.service: Failed at step EXEC spawning /usr/local/bin/kubelet: No such file or directory
Apr 30 22:03:10 cluster3-node1 systemd[1]: kubelet.service: Main process exited, code=exited, status=203/EXEC
Apr 30 22:03:10 cluster3-node1 systemd[1]: kubelet.service: Failed with result 'exit-code'.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

我们看到它正在尝试使用其服务配置文件中定义的一些参数执行/usr/local/bin/kubelet。查找错误和获取更多日志的一个好方法是手动运行命令（通常也使用其参数）。

➜ root@cluster3-node1:~# /usr/local/bin/kubelet
-bash: /usr/local/bin/kubelet: No such file or directory

➜ root@cluster3-node1:~# whereis kubelet
kubelet: /usr/bin/kubelet

1
2
3
4
5

另一种方法是查看服务的扩展日志记录，例如使用journalctl -u kubelet。

好吧，我们知道了，它指定了错误的路径。更正文件/etc/systemd/system/kubelet.service.d/10-kubeadm.conf中的路径并运行：

vim /etc/systemd/system/kubelet.service.d/10-kubeadm.conf # fix

systemctl daemon-reload && systemctl restart kubelet

systemctl status kubelet  # should now show running

1
2
3
4
5

此外，该节点应该可供 api 服务器使用，请给它一点时间：

➜ k get node
NAME                     STATUS   ROLES           AGE   VERSION
cluster3-controlplane1   Ready    control-plane   14d   v1.26.0
cluster3-node1           Ready    <none>          14d   v1.26.0

1
2
3
4

最后，我们将原因写入文件：

# /opt/course/18/reason.txt
wrong path to kubelet binary specified in service config
（服务配置中指定的kubelet二进制文件的路径错误）

1
2
3

# 考题-19 | 创建Secret并挂载到Pod

任务权重：3%

注意

只有当问题 18 或 20 已成功实现并且 k8s-c3-CCC 集群具有正常运行的工作节点时，才能解决此任务

切换集群：kubectl config use-context k8s-c3-CCC

在新的命名空间secret中执行以下操作。创建一个名为secret-pod且镜像为busybox:1.31.1的 pod ，它应该会持续运行一段时间。

在/opt/course/19/secret1.yaml中有一个现有的 Secret，在命名空间secret中创建它，并以只读方式将其挂载到 Pod 的/tmp/secret1中。

在命名空间secret中创建一个名为secret2的新密钥，该密钥应包含user=user1和pass=1234。这些条目应该作为环境变量APP_USER和APP_PASS在 Pod 的容器中可用。

确认一切正常。

# 解答

首先，我们创建一个共享空间和请求的 Secrets：

k create ns secret

cp /opt/course/19/secret1.yaml 19_secret1.yaml

vim 19_secret1.yaml

1
2
3
4
5

# 19_secret1.yaml
apiVersion: v1
data:
  halt: IyEgL2Jpbi9zaAo...
kind: Secret
metadata:
  creationTimestamp: null
  name: secret1
  namespace: secret           # change

1
2
3
4
5
6
7
8
9

k -f 19_secret1.yaml create

接下来我们创建第二个 Secret：

k -n secret create secret generic secret2 --from-literal=user=user1 --from-literal=pass=1234

现在我们创建 Pod 模板：

k -n secret run secret-pod --image=busybox:1.31.1 $do -- sh -c "sleep 5d" > 19.yaml

vim 19.yaml

1
2
3

# 19.yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: secret-pod
  name: secret-pod
  namespace: secret                       # add
spec:
  containers:
  - args:
    - sh
    - -c
    - sleep 1d
    image: busybox:1.31.1
    name: secret-pod
    resources: {}
    env:                                  # add
    - name: APP_USER                      # add
      valueFrom:                          # add
        secretKeyRef:                     # add
          name: secret2                   # add
          key: user                       # add
    - name: APP_PASS                      # add
      valueFrom:                          # add
        secretKeyRef:                     # add
          name: secret2                   # add
          key: pass                       # add
    volumeMounts:                         # add
    - name: secret1                       # add
      mountPath: /tmp/secret1             # add
      readOnly: true                      # add
  dnsPolicy: ClusterFirst
  restartPolicy: Always
  volumes:                                # add
  - name: secret1                         # add
    secret:                               # add
      secretName: secret1                 # add
status: {}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

在当前的 K8s 版本中，可能没有必要指定readOnly: true，因为它无论如何都是默认设置。

k -f 19.yaml create

最后，我们来看看是否一切都是正确的：

➜ k -n secret exec secret-pod -- env | grep APP
APP_PASS=1234
APP_USER=user1

1
2
3

➜ k -n secret exec secret-pod -- find /tmp/secret1
/tmp/secret1
/tmp/secret1/..data
/tmp/secret1/halt
/tmp/secret1/..2019_12_08_12_15_39.463036797
/tmp/secret1/..2019_12_08_12_15_39.463036797/halt

1
2
3
4
5
6

➜ k -n secret exec secret-pod -- cat /tmp/secret1/halt
#! /bin/sh
### BEGIN INIT INFO
# Provides:          halt
# Required-Start:
# Required-Stop:
# Default-Start:
# Default-Stop:      0
# Short-Description: Execute the halt command.
# Description:
...

1
2
3
4
5
6
7
8
9
10
11

一切都很好。

# 考题-20 | 更新Kubernetes版本并加入集群

任务权重：10%

切换集群：kubectl config use-context k8s-c3-CCC

你的同事说节点cluster3-node2运行的是较旧的 Kubernetes 版本，甚至不是集群的一部分。将该节点上的 Kubernetes 更新为cluster3-controlplane1上运行的确切版本。然后将此节点添加到群集中。为此，请使用 kubeadm。

# 解答

# 升级 Kubernetes 到符合`cluster3-controlplane1`的版本。

➜ k get node
NAME                     STATUS   ROLES           AGE   VERSION
cluster3-controlplane1   Ready    control-plane   22h   v1.26.0
cluster3-node1           Ready    <none>          22h   v1.26.0

1
2
3
4

控制平面节点似乎正在运行 Kubernetes 1.26.0，并且cluster3-node2还不是集群的一部分。

➜ ssh cluster3-node2

➜ root@cluster3-node2:~# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.0", GitCommit:"b46a3f887ca979b1a5d14fd39cb1af43e7e5d12d", GitTreeState:"clean", BuildDate:"2022-12-08T19:57:06Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}

➜ root@cluster3-node2:~# kubectl version --short
Client Version: v1.25.5
Kustomize Version: v4.5.7

➜ root@cluster3-node2:~# kubelet --version
Kubernetes v1.25.5

1
2
3
4
5
6
7
8
9
10
11

这里的 kubeadm 已经安装了预期的版本，所以我们不需要安装它。因此我们可以运行：

➜ root@cluster3-node2:~# kubeadm upgrade node
couldn't create a Kubernetes client from file "/etc/kubernetes/kubelet.conf": failed to load admin kubeconfig: open /etc/kubernetes/kubelet.conf: no such file or directory
To see the stack trace of this error execute with --v=5 or higher

1
2
3

这通常是升级节点的正确命令。但是这个错误意味着这个节点甚至从未被初始化过，所以这里没有什么要更新的。稍后将使用kubeadm join完成此操作。现在我们可以继续使用 kubelet 和 kubectl：

➜ root@cluster3-node2:~# apt update
...
Fetched 5,775 kB in 2s (2,313 kB/s)                               
Reading package lists... Done
Building dependency tree       
Reading state information... Done
90 packages can be upgraded. Run 'apt list --upgradable' to see them.

➜ root@cluster3-node2:~# apt show kubectl -a | grep 1.26
Version: 1.26.0-00

➜ root@cluster3-node2:~# apt install kubectl=1.26.0-00 kubelet=1.26.0-00
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages will be upgraded:
  kubectl kubelet
2 upgraded, 0 newly installed, 0 to remove and 135 not upgraded.
Need to get 30.5 MB of archives.
After this operation, 9,996 kB of additional disk space will be used.
Get:1 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubectl amd64 1.26.0-00 [10.1 MB]
Get:2 https://packages.cloud.google.com/apt kubernetes-xenial/main amd64 kubelet amd64 1.26.0-00 [20.5 MB]
Fetched 30.5 MB in 1s (29.7 MB/s)  
(Reading database ... 112508 files and directories currently installed.)
Preparing to unpack .../kubectl_1.26.0-00_amd64.deb ...
Unpacking kubectl (1.26.0-00) over (1.25.5-00) ...
Preparing to unpack .../kubelet_1.26.0-00_amd64.deb ...
Unpacking kubelet (1.26.0-00) over (1.25.5-00) ...
Setting up kubectl (1.26.0-00) ...
Setting up kubelet (1.26.0-00) ...

➜ root@cluster3-node2:~# kubelet --version
Kubernetes v1.26.0

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

现在我们来看看 kubeadm、kubectl 和 kubelet。重启 kubelet：

➜ root@cluster3-node2:~# service kubelet restart

➜ root@cluster3-node2:~# service kubelet status
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: activating (auto-restart) (Result: exit-code) since Wed 2022-12-21 16:29:26 UTC; 5s ago
       Docs: https://kubernetes.io/docs/home/
    Process: 32111 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
   Main PID: 32111 (code=exited, status=1/FAILURE)

Dec 21 16:29:26 cluster3-node2 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Dec 21 16:29:26 cluster3-node2 systemd[1]: kubelet.service: Failed with result 'exit-code'.

1
2
3
4
5
6
7
8
9
10
11
12
13
14

之所以出现这些错误，是因为我们仍然需要运行kubeadm join将节点加入集群。让我们在下一步中执行此操作。

# 将cluster 3-node2添加到集群

首先，我们登录到 controlplane1 并生成一个新的 TLS 引导令牌，同时打印出 join 命令：

➜ ssh cluster3-controlplane1

➜ root@cluster3-controlplane1:~# kubeadm token create --print-join-command
kubeadm join 192.168.100.31:6443 --token rbhrjh.4o93r31o18an6dll --discovery-token-ca-cert-hash sha256:d94524f9ab1eed84417414c7def5c1608f84dbf04437d9f5f73eb6255dafdb18

➜ root@cluster3-controlplane1:~# kubeadm token list
TOKEN                     TTL         EXPIRES                ...
44dz0t.2lgmone0i1o5z9fe   <forever>   <never>
4u477f.nmpq48xmpjt6weje   1h          2022-12-21T18:14:30Z
rbhrjh.4o93r31o18an6dll   23h         2022-12-22T16:29:58Z

1
2
3
4
5
6
7
8
9
10

可以看到我们的 token 在 23 小时内到期，我们可以通过传递 ttl 参数来调整它。

接下来，我们再次连接到 cluster3-node2 并简单地执行 join 命令：

➜ ssh cluster3-node2

➜ root@cluster3-node2:~# kubeadm join 192.168.100.31:6443 --token rbhrjh.4o93r31o18an6dll --discovery-token-ca-cert-hash sha256:d94524f9ab1eed84417414c7def5c1608f84dbf04437d9f5f73eb6255dafdb18
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.


➜ root@cluster3-node2:~# service kubelet status
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: active (running) since Wed 2022-12-21 16:32:19 UTC; 1min 4s ago
       Docs: https://kubernetes.io/docs/home/
   Main PID: 32510 (kubelet)
      Tasks: 11 (limit: 462)
     Memory: 55.2M
     CGroup: /system.slice/kubelet.service
             └─32510 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runti>

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

如果您在使用kubeadm join时遇到问题，您可能需要运行kubeadm reset。

不过，对我们来说很不错。最后，我们回到主终端并检查节点状态：

➜ k get node
NAME                     STATUS     ROLES           AGE   VERSION
cluster3-controlplane1   Ready      control-plane   22h   v1.26.0
cluster3-node1           Ready      <none>          22h   v1.26.0
cluster3-node2           NotReady   <none>          22h   v1.26.0

1
2
3
4
5

给予一点时间，直到节点准备好。

➜ k get node
NAME                     STATUS   ROLES           AGE   VERSION
cluster3-controlplane1   Ready    control-plane   22h   v1.26.0
cluster3-node1           Ready    <none>          22h   v1.26.0
cluster3-node2           Ready    <none>          22h   v1.26.0

1
2
3
4
5

我们看到cluster3-node2现已可用且是最新的。

# 考题-21 | 创建静态Pod和服务

任务权重：2%

切换集群：kubectl config use-context k8s-c3-CCC

在cluster3-controlplane1节点的命名空间default中创建一个名为my-static-pod的静态 Pod。它应该是镜像nginx:1.16-alpine，并且有10m CPU 和20Mi内存的资源请求。

然后创建一个名为static-pod-service的NodePort服务，该服务在端口80上公开该静态 Pod，并检查它是否有端点，以及它是否可以通过cluster3-controlplane1内网 IP 地址访问。您可以从主终端连接到内网节点 IP。

# 解答

➜ ssh cluster3-controlplane1

➜ root@cluster1-controlplane1:~# cd /etc/kubernetes/manifests/

➜ root@cluster1-controlplane1:~# kubectl run my-static-pod \
    --image=nginx:1.16-alpine \
    -o yaml --dry-run=client > my-static-pod.yaml

1
2
3
4
5
6
7

# /etc/kubernetes/manifests/my-static-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: my-static-pod
  name: my-static-pod
spec:
  containers:
  - image: nginx:1.16-alpine
    name: my-static-pod
    resources:
      requests:
        cpu: 10m
        memory: 20Mi
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

并确保它运行：

➜ k get pod -A | grep my-static
NAMESPACE     NAME                                   READY   STATUS   ...   AGE
default       my-static-pod-cluster3-controlplane1   1/1     Running  ...   22s

1
2
3

现在我们公开静态 Pod：

k expose pod my-static-pod-cluster3-controlplane1 \
  --name static-pod-service \
  --type=NodePort \
  --port 80

1
2
3
4

这将生成一个服务，如：

# kubectl expose pod my-static-pod-cluster3-controlplane1 --name static-pod-service --type=NodePort --port 80
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: null
  labels:
    run: my-static-pod
  name: static-pod-service
spec:
  ports:
  - port: 80
    protocol: TCP
    targetPort: 80
  selector:
    run: my-static-pod
  type: NodePort
status:
  loadBalancer: {}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

然后运行并测试：

➜ k get svc,ep -l run=my-static-pod
NAME                         TYPE       CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
service/static-pod-service   NodePort   10.99.168.252   <none>        80:30352/TCP   30s

NAME                           ENDPOINTS      AGE
endpoints/static-pod-service   10.32.0.4:80   30s

1
2
3
4
5
6

看起来不错。

# 考题-22 | 检查证书的有效期

任务权重：2%

切换集群：kubectl config use-context k8s-c2-AC

检查 kube-apiserver 服务器证书在cluster2-controlplane1上的有效期。使用 openssl 或 cfssl 执行此操作。将 exipiration 日期写入/opt/course/22/expiration。

此外，运行正确的kubeadm命令以列出到期日期，并确认两种方法显示相同的日期。

将更新 apiserver 服务器证书的正确kubeadm命令写入/opt/course/22/kubeadm-renew-certs.sh中。

# 解答

首先，我们来看看这个证书：

➜ ssh cluster2-controlplane1

➜ root@cluster2-controlplane1:~# find /etc/kubernetes/pki | grep apiserver
/etc/kubernetes/pki/apiserver.crt
/etc/kubernetes/pki/apiserver-etcd-client.crt
/etc/kubernetes/pki/apiserver-etcd-client.key
/etc/kubernetes/pki/apiserver-kubelet-client.crt
/etc/kubernetes/pki/apiserver.key
/etc/kubernetes/pki/apiserver-kubelet-client.key

1
2
3
4
5
6
7
8
9

接下来，我们使用 openssl 来找出到期日期：

➜ root@cluster2-controlplane1:~# openssl x509  -noout -text -in /etc/kubernetes/pki/apiserver.crt | grep Validity -A2
        Validity
            Not Before: Dec 20 18:05:20 2022 GMT
            Not After : Dec 20 18:05:20 2023 GMT

1
2
3
4

我们有了它，所以我们把它写在主终端上所需的位置：

# /opt/course/22/expiration
Dec 20 18:05:20 2023 GMT

1
2

我们也使用 kubeadm 的功能来获取过期时间：

➜ root@cluster2-controlplane1:~# kubeadm certs check-expiration | grep apiserver
apiserver                Jan 14, 2022 18:49 UTC   363d        ca               no      
apiserver-etcd-client    Jan 14, 2022 18:49 UTC   363d        etcd-ca          no      
apiserver-kubelet-client Jan 14, 2022 18:49 UTC   363d        ca               no

1
2
3
4

看起来不错最后，我们将更新所有证书的命令写入请求的位置：

# /opt/course/22/kubeadm-renew-certs.sh
kubeadm certs renew apiserver

1
2

# 考题-23 | Kubelet 客户端/服务器证书信息

任务权重：2%

切换集群：kubectl config use-context k8s-c2-AC

节点cluster2-node1已使用kubeadm和 TLS 引导添加到集群中。

找到 cluster 2-node1 的 “Issuer” 和 “Extended Key Usage” 值：

kubelet 客户端证书，用于与 kube-apiserver 的传出连接。
kubelet 服务器证书，用于从 kube-apiserver 传入连接的证书。

将信息写入文件/opt/course/23/certificate-info.txt。

比较两个证书的 “Issuer” 和 “Extended Key Usage” 字段并理解它们。

# 解答

要找到正确的 kubelet 证书目录，我们可以查找 kubelet 的--cert-dir参数的默认值。为此，请在 Kubernetes 文档中搜索 “kubelet”：https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet (opens new window)。我们可以使用ps aux或在/etc/systemd/system/kubelet.service.d/10-kubeadm.conf中检查是否配置了另一个证书目录。

首先我们检查 kubelet 客户端证书：

➜ ssh cluster2-node1

➜ root@cluster2-node1:~# openssl x509  -noout -text -in /var/lib/kubelet/pki/kubelet-client-current.pem | grep Issuer
        Issuer: CN = kubernetes
        
➜ root@cluster2-node1:~# openssl x509  -noout -text -in /var/lib/kubelet/pki/kubelet-client-current.pem | grep "Extended Key Usage" -A1
            X509v3 Extended Key Usage: 
                TLS Web Client Authentication

1
2
3
4
5
6
7
8

接下来我们检查 kubelet 服务器证书：

➜ root@cluster2-node1:~# openssl x509  -noout -text -in /var/lib/kubelet/pki/kubelet.crt | grep Issuer
          Issuer: CN = cluster2-node1-ca@1588186506

➜ root@cluster2-node1:~# openssl x509  -noout -text -in /var/lib/kubelet/pki/kubelet.crt | grep "Extended Key Usage" -A1
            X509v3 Extended Key Usage: 
                TLS Web Server Authentication

1
2
3
4
5
6

我们看到服务器证书是在工作节点本身上生成的，客户端证书是由 Kubernetes API 颁发的。“Extended Key Usage” 显示了它是用于客户端还是服务器身份验证。

# 考题-24 | NetworkPolicy

任务权重：9%

切换集群：kubectl config use-context k8s-c1-H

发生了一起安全事件，入侵者能够从一个被黑客入侵的后端 Pod 访问整个集群。

为了防止这种情况，请在命名空间project-snake中创建一个名为np-backend的 NetworkPolicy。它应该只允许后端 Pod 执行以下操作：

连接到db1-* Pods 的1111端口
连接到db2-* Pods 的2222端口

在策略中使用 Pods 的app标签。

在实现之后，从backend-* Pod 到vault-* Pod 在端口3333上的连接应该不再有效。

# 解答

# 正确实现

首先，我们来看看现有的 Pod 及其标签：

➜ k -n project-snake get pod
NAME        READY   STATUS    RESTARTS   AGE
backend-0   1/1     Running   0          8s
db1-0       1/1     Running   0          8s
db2-0       1/1     Running   0          10s
vault-0     1/1     Running   0          10s

➜ k -n project-snake get pod -L app
NAME        READY   STATUS    RESTARTS   AGE     APP
backend-0   1/1     Running   0          3m15s   backend
db1-0       1/1     Running   0          3m15s   db1
db2-0       1/1     Running   0          3m17s   db2
vault-0     1/1     Running   0          3m17s   vault

1
2
3
4
5
6
7
8
9
10
11
12
13

我们测试了当前的连接情况，发现没有任何限制：

➜ k -n project-snake get pod -o wide
NAME        READY   STATUS    RESTARTS   AGE     IP          ...
backend-0   1/1     Running   0          4m14s   10.44.0.24  ...
db1-0       1/1     Running   0          4m14s   10.44.0.25  ...
db2-0       1/1     Running   0          4m16s   10.44.0.23  ...
vault-0     1/1     Running   0          4m16s   10.44.0.22  ...

➜ k -n project-snake exec backend-0 -- curl -s 10.44.0.25:1111
database one

➜ k -n project-snake exec backend-0 -- curl -s 10.44.0.23:2222
database two

➜ k -n project-snake exec backend-0 -- curl -s 10.44.0.22:3333
vault secret storage

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

现在我们通过从 k8s 文档中复制并更改一个示例来创建 NP：

vim 24_np.yaml

# 24_np.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: np-backend
  namespace: project-snake
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
    - Egress                    # policy is only about Egress
  egress:
    -                           # first rule
      to:                           # first condition "to"
      - podSelector:
          matchLabels:
            app: db1
      ports:                        # second condition "port"
      - protocol: TCP
        port: 1111
    -                           # second rule
      to:                           # first condition "to"
      - podSelector:
          matchLabels:
            app: db2
      ports:                        # second condition "port"
      - protocol: TCP
        port: 2222

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

# 错误的例子

现在让我们来看看一个错误的例子：

# 错误
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: np-backend
  namespace: project-snake
spec:
  podSelector:
    matchLabels:
      app: backend
  policyTypes:
    - Egress
  egress:
    -                           # first rule
      to:                           # first condition "to"
      - podSelector:                    # first "to" possibility
          matchLabels:
            app: db1
      - podSelector:                    # second "to" possibility
          matchLabels:
            app: db2
      ports:                        # second condition "ports"
      - protocol: TCP                   # first "ports" possibility
        port: 1111
      - protocol: TCP                   # second "ports" possibility
        port: 2222

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

如果使用了这个错误的 NP，则 backend-* Pod 仍然可以连接到db2-* Pod 上的端口1111，这应该被禁止。

# 创建网络策略

我们创建正确的 NP：

k -f 24_np.yaml create

再次测试：

➜ k -n project-snake exec backend-0 -- curl -s 10.44.0.25:1111
database one

➜ k -n project-snake exec backend-0 -- curl -s 10.44.0.23:2222
database two

➜ k -n project-snake exec backend-0 -- curl -s 10.44.0.22:3333
^C

1
2
3
4
5
6
7
8

在 NP 上使用kubectl describe来查看 k8s 如何解释策略也很有帮助。

很好，看起来更安全了。任务完成。

# 考题-25 | Etcd快照保存和恢复

任务权重：8%

切换集群：kubectl config use-context k8s-c3-CCC

备份在cluster3-controlplane1上运行的 etcd，并将其保存在控制平面节点的/tmp/etcd-backup.db中。

然后在集群中创建一个属于你的 Pod。

最后恢复备份，确认集群仍在工作，并且创建的 Pod 不再与我们在一起。

# 解答

# Etcd备份

首先，我们登录到控制平面，并尝试创建一个 etcd 的快照：

➜ ssh cluster3-controlplane1

➜ root@cluster3-controlplane1:~# ETCDCTL_API=3 etcdctl snapshot save /tmp/etcd-backup.db
Error:  rpc error: code = Unavailable desc = transport is closing

1
2
3
4

但它失败了，因为我们需要验证自己。对于必要的信息，我们可以检查 etc 清单：

➜ root@cluster3-controlplane1:~# vim /etc/kubernetes/manifests/etcd.yaml

我们只检查etcd.yaml以获取必要的信息，我们不会更改它。

# /etc/kubernetes/manifests/etcd.yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    component: etcd
    tier: control-plane
  name: etcd
  namespace: kube-system
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://192.168.100.31:2379
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt                           # use
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd
    - --initial-advertise-peer-urls=https://192.168.100.31:2380
    - --initial-cluster=cluster3-controlplane1=https://192.168.100.31:2380
    - --key-file=/etc/kubernetes/pki/etcd/server.key                            # use
    - --listen-client-urls=https://127.0.0.1:2379,https://192.168.100.31:2379   # use
    - --listen-metrics-urls=http://127.0.0.1:2381
    - --listen-peer-urls=https://192.168.100.31:2380
    - --name=cluster3-controlplane1
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt                    # use
    - --snapshot-count=10000
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    image: k8s.gcr.io/etcd:3.3.15-0
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /health
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 15
      timeoutSeconds: 15
    name: etcd
    resources: {}
    volumeMounts:
    - mountPath: /var/lib/etcd
      name: etcd-data
    - mountPath: /etc/kubernetes/pki/etcd
      name: etcd-certs
  hostNetwork: true
  priorityClassName: system-cluster-critical
  volumes:
  - hostPath:
      path: /etc/kubernetes/pki/etcd
      type: DirectoryOrCreate
    name: etcd-certs
  - hostPath:
      path: /var/lib/etcd                                                     # important
      type: DirectoryOrCreate
    name: etcd-data
status: {}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61

但是我们也知道 api-server 正在连接 etcd，所以我们可以检查它的 manifest 是如何配置的：

➜ root@cluster3-controlplane1:~# cat /etc/kubernetes/manifests/kube-apiserver.yaml | grep etcd
    - --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
    - --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
    - --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
    - --etcd-servers=https://127.0.0.1:2379

1
2
3
4
5

我们使用身份验证信息并将其传递给 etcdctl：

➜ root@cluster3-controlplane1:~# ETCDCTL_API=3 etcdctl snapshot save /tmp/etcd-backup.db \
--cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/server.crt \
--key /etc/kubernetes/pki/etcd/server.key

Snapshot saved at /tmp/etcd-backup.db

1
2
3
4
5
6

注意

不要使用snapshot status，因为它可能会更改快照文件并使其无效。

# Etcd恢复

现在在集群中创建一个 Pod 并等待它运行：

➜ root@cluster3-controlplane1:~# kubectl run test --image=nginx
pod/test created

➜ root@cluster3-controlplane1:~# kubectl get pod -l run=test -w
NAME   READY   STATUS    RESTARTS   AGE
test   1/1     Running   0          60s

1
2
3
4
5
6

注意

如果您没有解决问题 18 或 20，并且 cluster3 没有就绪的工作节点，则创建的 Pod 可能会保持 Pending 状态。对于此任务，这仍然可以。

接下来，我们停止所有控制平面组件：

root@cluster3-controlplane1:~# cd /etc/kubernetes/manifests/

root@cluster3-controlplane1:/etc/kubernetes/manifests# mv * ..

root@cluster3-controlplane1:/etc/kubernetes/manifests# watch crictl ps

1
2
3
4
5

现在我们将快照恢复到特定目录：

➜ root@cluster3-controlplane1:~# ETCDCTL_API=3 etcdctl snapshot restore /tmp/etcd-backup.db \
--data-dir /var/lib/etcd-backup \
--cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/server.crt \
--key /etc/kubernetes/pki/etcd/server.key

2020-09-04 16:50:19.650804 I | mvcc: restore compact to 9935
2020-09-04 16:50:19.659095 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32

1
2
3
4
5
6
7
8

我们可以使用etcdctl --endpoints http://IP指定另一台主机进行备份，但这里我们只使用默认值：http://127.0.0.1:2379,http://127.0.0.1:4001。

恢复的文件位于新文件夹/var/lib/etcd-backup中，现在我们必须告诉 etcd 使用该目录：

➜ root@cluster3-controlplane1:~# vim /etc/kubernetes/etcd.yaml

# /etc/kubernetes/etcd.yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    component: etcd
    tier: control-plane
  name: etcd
  namespace: kube-system
spec:
...
    - mountPath: /etc/kubernetes/pki/etcd
      name: etcd-certs
  hostNetwork: true
  priorityClassName: system-cluster-critical
  volumes:
  - hostPath:
      path: /etc/kubernetes/pki/etcd
      type: DirectoryOrCreate
    name: etcd-certs
  - hostPath:
      path: /var/lib/etcd-backup                # change
      type: DirectoryOrCreate
    name: etcd-data
status: {}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

现在，我们再次将所有控制平面 yaml 移动到 manifest 目录中。给予它一点时间（最多几分钟）让 etcd 重新启动，让 api-server 再次可以访问：

root@cluster3-controlplane1:/etc/kubernetes/manifests# mv ../*.yaml .

root@cluster3-controlplane1:/etc/kubernetes/manifests# watch crictl ps

1
2
3

然后我们再次检查 Pod：

➜ root@cluster3-controlplane1:~# kubectl get pod -l run=test
No resources found in default namespace.

1
2

太棒了，备份和恢复成功工作了，因为我们的 Pod 不见了。

# 附加题目

# 附题-1 | 查找需要第一个终止的Pod

切换集群：kubectl config use-context k8s-c1-H

检查命名空间project-c13中所有可用的 Pod，并找到那些在节点耗尽资源（cpu 或内存）来调度所有 Pod 时可能首先终止的 Pod 的名称。将 Pod 名称写入/opt/course/e1/pods-not-stable.txt。

# 解答

当节点上的可用 CPU 或内存资源达到极限时，Kubernetes 将查找使用比请求更多资源的 Pod。他们将是第一批被解雇的候选人。如果某些 Pod 容器没有设置资源请求 / 限制，那么默认情况下，这些容器被认为使用了超过请求的资源。

Kubernetes 根据定义的资源和限制为 Pod 分配服务质量类，请在此处阅读更多内容：https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod (opens new window)

因此，我们应该寻找没有定义资源请求的 Pod，我们可以通过手动方法来做到这一点：

k -n project-c13 describe pod | less -p Requests # describe all pods and highlight Requests

或者我们这样做：

k -n project-c13 describe pod | egrep "^(Name:|    Requests:)" -A1

我们看到 Deployment c13-3cc-runner-heavy的 Pod 没有指定任何资源请求。因此，我们的答案是：

# /opt/course/e1/pods-not-stable.txt
c13-3cc-runner-heavy-65588d7d6-djtv9map
c13-3cc-runner-heavy-65588d7d6-v8kf5map
c13-3cc-runner-heavy-65588d7d6-wwpb4map
o3db-0
o3db-1 # 如果已通过先前的方案删除，则可能不存在

1
2
3
4
5
6

要自动化这个过程，你可以像这样使用 jsonpath：

➜ k -n project-c13 get pod \
  -o jsonpath="{range .items[*]} {.metadata.name}{.spec.containers[*].resources}{'\n'}"

 c13-2x3-api-86784557bd-cgs8gmap[requests:map[cpu:50m memory:20Mi]]
 c13-2x3-api-86784557bd-lnxvjmap[requests:map[cpu:50m memory:20Mi]]
 c13-2x3-api-86784557bd-mnp77map[requests:map[cpu:50m memory:20Mi]]
 c13-2x3-web-769c989898-6hbgtmap[requests:map[cpu:50m memory:10Mi]]
 c13-2x3-web-769c989898-g57nqmap[requests:map[cpu:50m memory:10Mi]]
 c13-2x3-web-769c989898-hfd5vmap[requests:map[cpu:50m memory:10Mi]]
 c13-2x3-web-769c989898-jfx64map[requests:map[cpu:50m memory:10Mi]]
 c13-2x3-web-769c989898-r89mgmap[requests:map[cpu:50m memory:10Mi]]
 c13-2x3-web-769c989898-wtgxlmap[requests:map[cpu:50m memory:10Mi]]
 c13-3cc-runner-98c8b5469-dzqhrmap[requests:map[cpu:30m memory:10Mi]]
 c13-3cc-runner-98c8b5469-hbtdvmap[requests:map[cpu:30m memory:10Mi]]
 c13-3cc-runner-98c8b5469-n9lswmap[requests:map[cpu:30m memory:10Mi]]
 c13-3cc-runner-heavy-65588d7d6-djtv9map[]
 c13-3cc-runner-heavy-65588d7d6-v8kf5map[]
 c13-3cc-runner-heavy-65588d7d6-wwpb4map[]
 c13-3cc-web-675456bcd-glpq6map[requests:map[cpu:50m memory:10Mi]]
 c13-3cc-web-675456bcd-knlpxmap[requests:map[cpu:50m memory:10Mi]]
 c13-3cc-web-675456bcd-nfhp9map[requests:map[cpu:50m memory:10Mi]]
 c13-3cc-web-675456bcd-twn7mmap[requests:map[cpu:50m memory:10Mi]]
 o3db-0{}
 o3db-1{}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

这列出了所有 Pod 的名称及其请求 / 限制，因此我们看到了三个没有定义的 Pod。

或者我们寻找服务质量类：

➜ k get pods -n project-c13 \
  -o jsonpath="{range .items[*]}{.metadata.name} {.status.qosClass}{'\n'}"

c13-2x3-api-86784557bd-cgs8g Burstable
c13-2x3-api-86784557bd-lnxvj Burstable
c13-2x3-api-86784557bd-mnp77 Burstable
c13-2x3-web-769c989898-6hbgt Burstable
c13-2x3-web-769c989898-g57nq Burstable
c13-2x3-web-769c989898-hfd5v Burstable
c13-2x3-web-769c989898-jfx64 Burstable
c13-2x3-web-769c989898-r89mg Burstable
c13-2x3-web-769c989898-wtgxl Burstable
c13-3cc-runner-98c8b5469-dzqhr Burstable
c13-3cc-runner-98c8b5469-hbtdv Burstable
c13-3cc-runner-98c8b5469-n9lsw Burstable
c13-3cc-runner-heavy-65588d7d6-djtv9 BestEffort
c13-3cc-runner-heavy-65588d7d6-v8kf5 BestEffort
c13-3cc-runner-heavy-65588d7d6-wwpb4 BestEffort
c13-3cc-web-675456bcd-glpq6 Burstable
c13-3cc-web-675456bcd-knlpx Burstable
c13-3cc-web-675456bcd-nfhp9 Burstable
c13-3cc-web-675456bcd-twn7m Burstable
o3db-0 BestEffort
o3db-1 BestEffort

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

这里我们看到了三个 BestEffort，这些 Pod 没有定义任何内存或 CPU 限制/请求。

一个好的做法是始终设置资源请求和限制。如果您不知道容器应该具有的值，则可以使用 Prometheus 等度量工具来了解这一点。您还可以在容器中使用kubectl top pod甚至kubectl exec，并使用和top类似的工具。

# 附题-2 | Curl 手动联系 API

切换集群：kubectl config use-context k8s-c1-H

命名空间project-hamster中有一个现有的 ServiceAccount secret-reader。创建一个名为tmp-api-contact、镜像为curlimages/curl:7.65.3的 Pod，它使用此 ServiceAccount。确保容器保持运行。

通过 Exec 进入 Pod 并使用curl手动访问该集群的 Kubernetes Api，并列出所有可用的 secret。您可以忽略不安全的 https 连接。将此命令写入文件/opt/course/e4/list-secrets.sh。

# 解答

https://kubernetes.io/docs/tasks/run-application/access-api-from-pod (opens new window)

了解 Kubernetes API 的工作原理非常重要。为此，它有助于手动连接到 API，例如使用 curl。例如，您可以通过在 Kubernetes 文档中搜索 “curl API” 来快速查找该信息。

首先，我们创建 Pod：

k run tmp-api-contact \
  --image=curlimages/curl:7.65.3 $do \
  --command > e2.yaml -- sh -c 'sleep 1d'

vim e2.yaml

1
2
3
4
5

添加服务帐户名称和命名空间：

# e2.yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: tmp-api-contact
  name: tmp-api-contact
  namespace: project-hamster          # add
spec:
  serviceAccountName: secret-reader   # add
  containers:
  - command:
    - sh
    - -c
    - sleep 1d
    image: curlimages/curl:7.65.3
    name: tmp-api-contact
    resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

然后运行并执行：

k -f 6.yaml create

k -n project-hamster exec tmp-api-contact -it -- sh

1
2
3

进入容器后，我们可以尝试使用curl连接到 api，该 api 一般通过命名空间default中名为kubernetes的服务提供（您应该知道 dns 解析如何跨命名空间工作）。否则，我们可以通过运行env命令并从环境变量中找到端点 IP。

现在我们可以做：

curl https://kubernetes.default
curl -k https://kubernetes.default # 忽略工单说明中允许的不安全
curl -k https://kubernetes.default/api/v1/secrets # 显示禁止访问403

1
2
3

最后一个命令显示 403 禁止，这是因为我们没有向我们传递任何授权信息。Kubernetes Api 服务器认为我们以system:anonymous身份进行连接。我们想改变这一点，并使用名为secret-reader的 Pods ServiceAccount 进行连接。

我们在/var/run/secrets/kubernetes.io/serviceaccount的挂载文件夹中找到了令牌，因此我们这样做：

➜ TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
➜ curl -k https://kubernetes.default/api/v1/secrets -H "Authorization: Bearer ${TOKEN}"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0{
  "kind": "SecretList",
  "apiVersion": "v1",
  "metadata": {
    "selfLink": "/api/v1/secrets",
    "resourceVersion": "10697"
  },
  "items": [
    {
      "metadata": {
        "name": "default-token-5zjbd",
        "namespace": "default",
        "selfLink": "/api/v1/namespaces/default/secrets/default-token-5zjbd",
        "uid": "315dbfd9-d235-482b-8bfc-c6167e7c1461",
        "resourceVersion": "342",
...

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

现在，我们可以列出所有 Secret，我们的 Pod 在注册为 ServiceAccount 的secret-reader下运行。

要使用加密的 https 连接，我们可以运行：

CACERT=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
curl --cacert ${CACERT} https://kubernetes.default/api/v1/secrets -H "Authorization: Bearer ${TOKEN}"

1
2

对于故障排除，我们还可以检查 ServiceAccount 是否能够使用以下命令列出 Secrets：

➜ k auth can-i get secret --as system:serviceaccount:project-hamster:secret-reader
yes

1
2

最后，将命令写入请求的位置：

# /opt/course/e4/list-secrets.sh
TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
curl -k https://kubernetes.default/api/v1/secrets -H "Authorization: Bearer ${TOKEN}"

1
2
3

# 预习题目

# 预览题-1

切换集群：kubectl config use-context k8s-c2-AC

集群管理员要求您查找以下有关在cluster2-controlplane1上运行的 etcd 的信息：

服务器私钥位置
服务器证书到期日期
是否已启用客户端证书身份验证

将这些信息写入/opt/course/p1/etcd-info.txt

最后，系统会要求您在cluster2-controlplane1上的/etc/etcd-snapshot.db中保存一个 etcd 快照并显示其状态。

# 解答

# 查询ETCD信息

让我们检查节点：

➜ k get node
NAME                     STATUS   ROLES           AGE    VERSION
cluster2-controlplane1   Ready    control-plane   89m   v1.23.1
cluster2-node1           Ready    <none>          87m   v1.23.1

➜ ssh cluster2-controlplane1

1
2
3
4
5
6

首先，我们来看看 etcd 是如何在这个集群中设置的：

➜ root@cluster2-controlplane1:~# kubectl -n kube-system get pod
NAME                                                READY   STATUS    RESTARTS   AGE
coredns-66bff467f8-k8f48                            1/1     Running   0          26h
coredns-66bff467f8-rn8tr                            1/1     Running   0          26h
etcd-cluster2-controlplane1                         1/1     Running   0          26h
kube-apiserver-cluster2-controlplane1               1/1     Running   0          26h
kube-controller-manager-cluster2-controlplane1      1/1     Running   0          26h
kube-proxy-qthfg                                    1/1     Running   0          25h
kube-proxy-z55lp                                    1/1     Running   0          26h
kube-scheduler-cluster2-controlplane1               1/1     Running   1          26h
weave-net-cqdvt                                     2/2     Running   0          26h
weave-net-dxzgh                                     2/2     Running   1          25h

1
2
3
4
5
6
7
8
9
10
11
12

我们看到它作为一个 Pod 运行，更具体地说，是一个静态 Pod。因此，我们检查静态清单的默认 kubelet 目录：

➜ root@cluster2-controlplane1:~# find /etc/kubernetes/manifests/
/etc/kubernetes/manifests/
/etc/kubernetes/manifests/kube-controller-manager.yaml
/etc/kubernetes/manifests/kube-apiserver.yaml
/etc/kubernetes/manifests/etcd.yaml
/etc/kubernetes/manifests/kube-scheduler.yaml

➜ root@cluster2-controlplane1:~# vim /etc/kubernetes/manifests/etcd.yaml

1
2
3
4
5
6
7
8

因此，我们来看看 yaml 中 etcd 启动时使用的参数：

# /etc/kubernetes/manifests/etcd.yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    component: etcd
    tier: control-plane
  name: etcd
  namespace: kube-system
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://192.168.102.11:2379
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt              # server certificate
    - --client-cert-auth=true                                      # enabled
    - --data-dir=/var/lib/etcd
    - --initial-advertise-peer-urls=https://192.168.102.11:2380
    - --initial-cluster=cluster2-controlplane1=https://192.168.102.11:2380
    - --key-file=/etc/kubernetes/pki/etcd/server.key               # server private key
    - --listen-client-urls=https://127.0.0.1:2379,https://192.168.102.11:2379
    - --listen-metrics-urls=http://127.0.0.1:2381
    - --listen-peer-urls=https://192.168.102.11:2380
    - --name=cluster2-controlplane1
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
...

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

我们看到客户端身份验证已启用，并且还请求了服务器私钥的路径，现在让我们看看服务器证书的过期时间：

➜ root@cluster2-controlplane1:~# openssl x509  -noout -text -in /etc/kubernetes/pki/etcd/server.crt | grep Validity -A2
        Validity
            Not Before: Sep 13 13:01:31 2021 GMT
            Not After : Sep 13 13:01:31 2022 GMT

1
2
3
4

这样就可以了。让我们将信息写入请求的文件：

# /opt/course/p1/etcd-info.txt
Server private key location: /etc/kubernetes/pki/etcd/server.key
Server certificate expiration date: Sep 13 13:01:31 2022 GMT
Is client certificate authentication enabled: yes

1
2
3
4

# 创建etcd快照

首先，我们尝试：

ETCDCTL_API=3 etcdctl snapshot save /etc/etcd-snapshot.db

我们也从 yaml 中得到端点。但是我们需要指定更多的参数，所有这些我们都可以在上面的 yaml 声明中找到：

ETCDCTL_API=3 etcdctl snapshot save /etc/etcd-snapshot.db \
--cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/server.crt \
--key /etc/kubernetes/pki/etcd/server.key

1
2
3
4

这一招奏效了。现在我们可以输出备份文件的状态：

➜ root@cluster2-controlplane1:~# ETCDCTL_API=3 etcdctl snapshot status /etc/etcd-snapshot.db
4d4e953, 7213, 1291, 2.7 MB

1
2

状态显示：

哈希: 4d4e953
校订版本: 7213
总数据量: 1291
总大小: 2.7 MB

# 预览题-2

切换集群：kubectl config use-context k8s-c1-H

系统会要求您确认 kube-proxy 在所有节点上都正常运行。为此，请在命名空间project-hamster中执行以下操作：

创建一个名为p2-pod的新 Pod，其中包含两个容器，一个是镜像nginx:1.21.3-alpine，另一个是镜像busybox:1.31。确保 busybox 容器保持运行一段时间。

创建一个名为p2-service的新 Service，该 Service 在集群内部的端口3000->80上公开该 Pod。

在所有节点cluster1-controlplane1、cluster1-node1和cluster1-node2上找到 kube-proxy 容器，并确保它使用的是 iptables。为此，请使用命令crictl。

将属于所创建服务p2-service的所有节点的 iptables 规则写入文件/opt/course/p2/iptables.txt。

最后删除 Service 并确认 iptables 规则从所有节点中消失。

# 解答

# 创建Pod

首先，我们创建 Pod：

# check out export statement on top which allows us to use $do
k run p2-pod --image=nginx:1.21.3-alpine $do > p2.yaml

vim p2.yaml

1
2
3
4

# p2.yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: p2-pod
  name: p2-pod
  namespace: project-hamster             # add
spec:
  containers:
  - image: nginx:1.21.3-alpine
    name: p2-pod
  - image: busybox:1.31                  # add
    name: c2                             # add
    command: ["sh", "-c", "sleep 1d"]    # add
    resources: {}
  dnsPolicy: ClusterFirst
  restartPolicy: Always
status: {}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

k -f p2.yaml create

# 创建服务

接下来我们创建服务：

k -n project-hamster expose pod p2-pod --name p2-service --port 3000 --target-port 80

这将创建一个如下所示的 yaml：

apiVersion: v1
kind: Service
metadata:
  creationTimestamp: "2020-04-30T20:58:14Z"
  labels:
    run: p2-pod
  managedFields:
...
    operation: Update
    time: "2020-04-30T20:58:14Z"
  name: p2-service
  namespace: project-hamster
  resourceVersion: "11071"
  selfLink: /api/v1/namespaces/project-hamster/services/p2-service
  uid: 2a1c0842-7fb6-4e94-8cdb-1602a3b1e7d2
spec:
  clusterIP: 10.97.45.18
  ports:
  - port: 3000
    protocol: TCP
    targetPort: 80
  selector:
    run: p2-pod
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

我们应该确认 Pod 和服务是连接的，因此服务应该有端点。

k -n project-hamster get pod,svc,ep

# 确认kube-proxy正在运行并且正在使用iptables

首先，我们获取集群中的节点：

➜ k get node
NAME                     STATUS   ROLES           AGE   VERSION
cluster1-controlplane1   Ready    control-plane   98m   v1.23.1
cluster1-node1           Ready    <none>          96m   v1.23.1
cluster1-node2           Ready    <none>          95m   v1.23.1

1
2
3
4
5

这里的想法是登录到每个节点，找到 kube-proxy 容器并检查其日志：

➜ ssh cluster1-controlplane1

➜ root@cluster1-controlplane1$ crictl ps | grep kube-proxy
27b6a18c0f89c       36c4ebbc9d979       3 hours ago         Running             kube-proxy

➜ root@cluster1-controlplane1~# crictl logs 27b6a18c0f89c
...
I0913 12:53:03.096620       1 server_others.go:212] Using iptables Proxier.
...

1
2
3
4
5
6
7
8
9

这应该在每个节点上重复，并产生相同的输出Using iptables Proxier。

# 检查kube-proxy是否正在创建iptables规则

现在，我们首先手动检查每个节点上的 iptables 规则：

➜ ssh cluster1-controlplane1 iptables-save | grep p2-service
-A KUBE-SEP-6U447UXLLQIKP7BB -s 10.44.0.20/32 -m comment --comment "project-hamster/p2-service:" -j KUBE-MARK-MASQ
-A KUBE-SEP-6U447UXLLQIKP7BB -p tcp -m comment --comment "project-hamster/p2-service:" -m tcp -j DNAT --to-destination 10.44.0.20:80
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.97.45.18/32 -p tcp -m comment --comment "project-hamster/p2-service: cluster IP" -m tcp --dport 3000 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.97.45.18/32 -p tcp -m comment --comment "project-hamster/p2-service: cluster IP" -m tcp --dport 3000 -j KUBE-SVC-2A6FNMCK6FDH7PJH
-A KUBE-SVC-2A6FNMCK6FDH7PJH -m comment --comment "project-hamster/p2-service:" -j KUBE-SEP-6U447UXLLQIKP7BB

➜ ssh cluster1-node1 iptables-save | grep p2-service
-A KUBE-SEP-6U447UXLLQIKP7BB -s 10.44.0.20/32 -m comment --comment "project-hamster/p2-service:" -j KUBE-MARK-MASQ
-A KUBE-SEP-6U447UXLLQIKP7BB -p tcp -m comment --comment "project-hamster/p2-service:" -m tcp -j DNAT --to-destination 10.44.0.20:80
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.97.45.18/32 -p tcp -m comment --comment "project-hamster/p2-service: cluster IP" -m tcp --dport 3000 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.97.45.18/32 -p tcp -m comment --comment "project-hamster/p2-service: cluster IP" -m tcp --dport 3000 -j KUBE-SVC-2A6FNMCK6FDH7PJH
-A KUBE-SVC-2A6FNMCK6FDH7PJH -m comment --comment "project-hamster/p2-service:" -j KUBE-SEP-6U447UXLLQIKP7BB

➜ ssh cluster1-node2 iptables-save | grep p2-service
-A KUBE-SEP-6U447UXLLQIKP7BB -s 10.44.0.20/32 -m comment --comment "project-hamster/p2-service:" -j KUBE-MARK-MASQ
-A KUBE-SEP-6U447UXLLQIKP7BB -p tcp -m comment --comment "project-hamster/p2-service:" -m tcp -j DNAT --to-destination 10.44.0.20:80
-A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.97.45.18/32 -p tcp -m comment --comment "project-hamster/p2-service: cluster IP" -m tcp --dport 3000 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.97.45.18/32 -p tcp -m comment --comment "project-hamster/p2-service: cluster IP" -m tcp --dport 3000 -j KUBE-SVC-2A6FNMCK6FDH7PJH
-A KUBE-SVC-2A6FNMCK6FDH7PJH -m comment --comment "project-hamster/p2-service:" -j KUBE-SEP-6U447UXLLQIKP7BB

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

太好了现在让我们将这些日志写入请求的文件：

➜ ssh cluster1-controlplane1 iptables-save | grep p2-service >> /opt/course/p2/iptables.txt
➜ ssh cluster1-node1 iptables-save | grep p2-service >> /opt/course/p2/iptables.txt
➜ ssh cluster1-node2 iptables-save | grep p2-service >> /opt/course/p2/iptables.txt

1
2
3

# 删除服务并确认iptables规则已消失

删除服务：

k -n project-hamster delete svc p2-service

并确认 iptables 规则已经消失：

➜ ssh cluster1-controlplane1 iptables-save | grep p2-service
➜ ssh cluster1-node1 iptables-save | grep p2-service
➜ ssh cluster1-node2 iptables-save | grep p2-service

1
2
3

结束。

Kubernetes 服务在所有节点上使用 iptables 规则（默认配置）实现。每当一个 Service 被修改、创建、删除或者一个 Service 的端点发生变化时，kube-apiserver 都会联系每个节点的 kube-proxy，根据当前状态更新 iptables 规则。

# 预览题-3

切换集群：kubectl config use-context k8s-c2-AC

在命名空间default中使用镜像httpd:2.4.41-alpine创建一个名为check-ip的 Pod。在端口 80 上将其公开为名为check-ip-service的 ClusterIP 服务。记住/输出该服务的 IP。

将集群的服务 CIDR 更改为11.96.0.0/12。

然后创建另一个名为check-ip-service2的 Service，指向同一个 Pod，以检查您的设置是否生效。最后，检查第一个服务的 IP 是否已更改。

# 解答

让我们创建Pod并公开它：

k run check-ip --image=httpd:2.4.41-alpine

k expose pod check-ip --name check-ip-service --port 80

1
2
3

检查 Pod 和 Service ips：

➜ k get svc,ep -l run=check-ip
NAME                       TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)   AGE
service/check-ip-service   ClusterIP   10.104.3.45   <none>        80/TCP    8s

NAME                         ENDPOINTS      AGE
endpoints/check-ip-service   10.44.0.3:80   7s

1
2
3
4
5
6

现在我们在 kube-apiserver 上更改 Service CIDR：

➜ ssh cluster2-controlplane1

➜ root@cluster2-controlplane1:~# vim /etc/kubernetes/manifests/kube-apiserver.yaml

1
2
3

# /etc/kubernetes/manifests/kube-apiserver.yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    component: kube-apiserver
    tier: control-plane
  name: kube-apiserver
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-apiserver
    - --advertise-address=192.168.100.21
...
    - --service-account-key-file=/etc/kubernetes/pki/sa.pub
    - --service-cluster-ip-range=11.96.0.0/12             # change
    - --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
    - --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
...

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

# 给予一点时间，让 kube-apiserver 和管理器重新启动

等待 API 再次启动：

➜ root@cluster2-controlplane1:~# kubectl -n kube-system get pod | grep api
kube-apiserver-cluster2-controlplane1            1/1     Running   0              49s

1
2

现在我们对控制管理器执行相同的操作：

➜ root@cluster2-controlplane1:~# vim /etc/kubernetes/manifests/kube-controller-manager.yaml

# /etc/kubernetes/manifests/kube-controller-manager.yaml
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    component: kube-controller-manager
    tier: control-plane
  name: kube-controller-manager
  namespace: kube-system
spec:
  containers:
  - command:
    - kube-controller-manager
    - --allocate-node-cidrs=true
    - --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf
    - --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf
    - --bind-address=127.0.0.1
    - --client-ca-file=/etc/kubernetes/pki/ca.crt
    - --cluster-cidr=10.244.0.0/16
    - --cluster-name=kubernetes
    - --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
    - --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
    - --controllers=*,bootstrapsigner,tokencleaner
    - --kubeconfig=/etc/kubernetes/controller-manager.conf
    - --leader-elect=true
    - --node-cidr-mask-size=24
    - --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
    - --root-ca-file=/etc/kubernetes/pki/ca.crt
    - --service-account-private-key-file=/etc/kubernetes/pki/sa.key
    - --service-cluster-ip-range=11.96.0.0/12         # change
    - --use-service-account-credentials=true

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

# 给予它一点时间让调度程序重新启动。

我们可以使用crictl检查它是否重新启动：

➜ root@cluster2-controlplane1:~# crictl ps | grep scheduler
3d258934b9fd6    aca5ededae9c8    About a minute ago   Running    kube-scheduler ...

1
2

再次检查我们现有的 Pod 和服务：

➜ k get pod,svc -l run=check-ip
NAME           READY   STATUS    RESTARTS   AGE
pod/check-ip   1/1     Running   0          21m

NAME                       TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
service/check-ip-service   ClusterIP   10.99.32.177   <none>        80/TCP    21m

1
2
3
4
5
6

到目前为止没有任何变化。现在，我们像以前一样创建另一个服务：

k expose pod check-ip --name check-ip-service2 --port 80

再检查一遍：

➜ k get svc,ep -l run=check-ip
NAME                        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
service/check-ip-service    ClusterIP   10.109.222.111   <none>        80/TCP    8m
service/check-ip-service2   ClusterIP   11.111.108.194   <none>        80/TCP    6m32s

NAME                          ENDPOINTS      AGE
endpoints/check-ip-service    10.44.0.1:80   8m
endpoints/check-ip-service2   10.44.0.1:80   6m13s

1
2
3
4
5
6
7
8

好了，新服务获得了分配的新指定范围的 ip。我们还看到，这两个 Service 都将我们的 Pod 作为端点。

编辑

← 考题复习-3 XMLHttpRequest记录→

个人模拟考试

# 个人模拟考试

# 考前设置

# 命令行

# vim

# 模拟考题

# 考题-1 | 集群上下文

# 解答

# 考题-2 | 调度控制平面节点上的Pod

# 解答

# 考题-3 | 缩减 StatefulSet

# 解答

# 考题-4 | Pod就绪（如果服务可达）

# 解答

# 考题-5 | Kubectl排序

# 解答

# 考题-6 | 储存、PV、PVC、Pod挂载

# 解答

# 考题-7 | Node和Pod资源使用情况

# 解答

# 考题-8 | 获取控制平面信息

# 解答

# 考题-9 | 杀死调度器，手动调度

# 解答

# 暂停调度器

# 创建Pod

# 手动调度Pod

# 再次启动调度程序

# 考题-10 | RBAC ServiceAccount Role RoleBinding

# 解答

# 让我们稍微讨论一下RBAC资源

# 前往解决方案

# 考题-11 | 所有节点上的DaemonSet

# 解答

# 考题-12 | 所有节点上的Deployment

# 解答

# PodAntiAffinity

# TopologySpreadConstraints

# 应用并运行

# 考题-13 | 多容器 Pod 共享卷

# 解答

# 考题-14 | 了解群集信息

# 解答

# 有多少控制平面和工作节点可用？

# 服务的 CIDR（网段）是多少？

# 配置了哪个网络（或 CNI 插件），其配置文件在哪里？

# 在 cluster1-node1 上运行的静态 pod 会有哪个后缀？

# 结果

# 考题-15 | 群集事件日志记录

# 解答

# 考题-16 | 命名空间和API资源

# 解答

# 命名空间和资源

# 最多Role的命名空间

# 考题-17 | 找到Pod的容器并检查信息

# 解答

# 考题-18 | 修复Kubelet

# 解答

# 考题-19 | 创建Secret并挂载到Pod

# 解答

# 考题-20 | 更新Kubernetes版本并加入集群

# 解答

# 升级 Kubernetes 到符合cluster3-controlplane1的版本。

# 将cluster 3-node2添加到集群

# 考题-21 | 创建静态Pod和服务

# 解答

# 考题-22 | 检查证书的有效期

# 解答

# 考题-23 | Kubelet 客户端/服务器证书信息

# 解答

# 考题-24 | NetworkPolicy

# 解答

# 正确实现

# 错误的例子

# 创建网络策略

# 考题-25 | Etcd快照保存和恢复

# 解答

# Etcd备份

# Etcd恢复

# 附加题目

# 升级 Kubernetes 到符合`cluster3-controlplane1`的版本。