污点和容忍Taint&Toleration&亲和力Affinity

一.污点和容忍概念

官方文档:https://kubernetes.io/zh/docs/concepts/scheduling-eviction/taint-and-toleration/
设计理念:Taint在一类服务器上打上污点,让不能容忍这个污点的Pod不能部署在打了污点的服务器上。Toleration是让Pod容忍节点上配置的污点,可以让一些需要特殊配置的Pod能够调用到具有污点和特殊配置的节点上。

  • Taint是作用在节点上。

Toleration是作用在pod上。

1.污点配置解析

创建一个污点(一个节点可以有多个污点):
kubectl taint nodes NODE_NAME TAINT_KEY=TAINT_VALUE:EFFECT
比如: kubectl taint nodes k8s-node01 ssd=true:PreferNoSchedule

查看: kubectl describe node k8s-node01 | grep Taint (注意大写T)

NoSchedule:禁止调度到该节点,已经在该节点上的Pod不受影响
NoExecute:禁止调度到该节点,如果不符合这个污点,会立马被驱逐(或在一段时间后)
PreferNoSchedule:尽量避免将Pod调度到指定的节点上,如果没有更合适的节点,可以部署到该节点

2.容忍配置解析

方式一完全匹配:满足全部条件
tolerations:
– key: “taintKey”
operator: “Equal”
value: “taintValue”
effect: “NoSchedule”

方式二不完全匹配:满足一个key,符合NoSchedule
tolerations:
– key: “taintKey”
operator: “Exists”
effect: “NoSchedule”

方式三大范围匹配(不推荐key为内置Taint):满足一个key即可
– key: “taintKey”
operator: “Exists

方式四匹配所有(不推荐):
tolerations:
– operator: “Exists”

停留时间配置:(默认300秒迁移,tolerationSeconds设置迁移时间,下列3600秒驱逐走)
tolerations:
– key: “key1”
operator: “Equal”
value: “value1”
effect: “NoExecute”
tolerationSeconds: 3600

实例:
1. 有一个节点(假设node01)是纯SSD硬盘的节点,现需要只有一些需要高性能存储的Pod才能调度到该节点上
给节点打上污点和标签:
[root@k8s-master01 ~]# kubectl get po -A -owide | grep node01 #查看node01有哪些pod
[root@k8s-master01 ~]# kubectl taint nodes k8s-node01 ssd:PreferNoSchedule- #去除PreferNoSchedule属性污点
[root@k8s-master01 ~]# kubectl taint nodes k8s-node01 ssd=true:NoExecute #此时会驱逐没有容忍该污点的Pod
[root@k8s-master01 ~]# kubectl taint nodes k8s-node01 ssd=true:NoSchedule #给node01打上污点
[root@k8s-master01 ~]# kubectl label node k8s-node01 ssd=true #给node01打上ssd标签
[root@k8s-master01 ~]# kubectl get node -l ssd #查看集群有ssd的节点
[root@k8s-master01 ~]# kubectl describe node k8s-node01 | grep Taint #查看node01打上的污点

配置Pod:(表示能部署到node01节点,并不表示一定能部署在node01节点)

[root@k8s-master01 ~]# vim  tolerations.yaml
apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    env: test
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent
  nodeSelector:
    ssd: "true"
  tolerations:
  - key: "ssd"
    operator: "Exists"

由于打了NoExecute,node01上只剩下了calico,然后创建pod,发现pod已经成功部署到node01节点。

[root@k8s-master01 ~]# kubectl get pod -A -owide | grep node01 
kube-system        calico-node-hrj82          1/1     Running   15         6d13h   192.168.0.103     k8s-node01   
kube-system        kube-proxy-mrl9j           1/1     Running   7          6d12h   192.168.0.103     k8s-node01    

[root@k8s-master01 ~]# kubectl create -f tolerations.yaml  #创建pod,查看效果
pod/nginx created

[root@k8s-master01 ~]# kubectl get pod -A -owide | grep node01   #发现pod已经成功部署到node01节点
default       nginx      1/1     Running     0        50s     172.161.125.33    k8s-node01    

删除pod,修改yaml,把容忍注释掉,再次部署发现pod没部署成功,describe寻找问题。

[root@k8s-master01 ~]# vim tolerations.yaml  #修改
....
  nodeSelector:
    ssd: "true"
  #tolerations:
  #- key: "ssd"
  #  operator: "Exists"

[root@k8s-master01 ~]# kubectl delete -f tolerations.yaml #删除
pod "nginx" deleted

[root@k8s-master01 ~]# kubectl get -f tolerations.yaml  #nginx处于pending状态
NAME    READY   STATUS    RESTARTS   AGE
nginx   0/1     Pending   0          89s

[root@k8s-master01 ~]# kubectl describe po nginx   #一个节点有污点,但是没容忍,四个节点没affinity。
...
  Warning  FailedScheduling  84s   default-scheduler  0/5 nodes are available: 
1 node(s) had taint {ssd: true}, that the pod didn't tolerate, 4 node(s) didn't match Pod's node affinity.

3.内置污点

node.kubernetes.io/not-ready:#节点未准备好,相当于节点状态Ready的值为False。
node.kubernetes.io/unreachable:#Node Controller访问不到节点,相当于节点状态Ready的值为Unknown。node.kubernetes.io/out-of-disk:#节点磁盘耗尽。
node.kubernetes.io/memory-pressure:#节点存在内存压力。
node.kubernetes.io/disk-pressure:#节点存在磁盘压力。
node.kubernetes.io/network-unavailable:#节点网络不可达。
node.kubernetes.io/unschedulable:#节点不可调度。
node.cloudprovider.kubernetes.io/uninitialized:#如果Kubelet启动时指定了一个外部的cloudprovider,它将给当前节点添加一个Taint将其标记为不可用。在cloud-controller-manager的一个controller初始化这个节点后,Kubelet将删除这个Taint。

节点不健康,6000秒后再驱逐(默认是300秒):
tolerations:
- key: "node.kubernetes.io/unreachable"
  operator: "Exists"
  effect: "NoExecute"
  tolerationSeconds: 6000

4.taint常用命令

创建一个污点(一个节点可以有多个污点):
	kubectl taint nodes NODE_NAME TAINT_KEY=TAINT_VALUE:EFFECT
比如:
	kubectl taint nodes k8s-node01 ssd=true:PreferNoSchedule
查看一个节点的污点:
	kubectl  get node k8s-node01 -o go-template --template {{.spec.taints}}
	kubectl describe node k8s-node01 | grep Taints -A 10
删除污点(和label类似):
	基于Key删除: kubectl  taint nodes k8s-node01 ssd-
	基于Key+Effect删除: kubectl  taint nodes k8s-node01 ssd:PreferNoSchedule-
修改污点(Key和Effect相同):
     kubectl  taint nodes k8s-node01 ssd=true:PreferNoSchedule --overwrite

二.亲和力Affinity


Affinity亲和力:
·NodeAffinity:节点亲和力/反亲和力
·PodAffinity:Pod亲和力
·PodAntiAffinity:Pod反亲和力
Affinity分类:

1.Affinity的几种场景

如下图,一个应用分别部署在4个node节点中,当其中一个出现问题时,其他3个可以确保高可用。


如下图,一个应用分别部署在两个区域,当其中一个区域出故障(光纤挖断等),另一个区域可以确保高可用。

尽量把同一项目不同应用部署在不同的节点上(确保宕机等影响范围降低)

2.节点node亲和力的配置

[root@k8s-master01 ~]# vim with-node-affinity.yaml

apiVersion: v1
kind: Pod
metadata:
  name: with-node-affinity
spec:
  affinity:    #和containers对齐
    nodeAffinity:   #节点亲和力(部署在一个节点)
      requiredDuringSchedulingIgnoredDuringExecution:  #硬亲和力配置(required,强制),与软亲和力只能存在一项。
        nodeSelectorTerms:   #节点选择器配置,可以配置多个matchExpressions(满足其一)
        - matchExpressions:   #可以配置多个key、value类型的选择器(都需要满足)
          - key: kubernetes.io/e2e-az-name
            operator: In   #标签匹配的方式(下文)
            values:   #可以配置多个(满足其一)
            - e2e-az1
            - az-2
      preferredDuringSchedulingIgnoredDuringExecution:  #软亲和力配置(preferred),与硬亲和力只能存在一项。
      - weight: 1    #软亲和力的权重,权重越高优先级越大,范围1-100
        preference:  #软亲和力配置项,和weight同级,可以配置多个,matchExpressions和硬亲和力一致
          matchExpressions:
          - key: another-node-label-key
            operator: In   #标签匹配的方式(下文)
            values:
            - another-node-label-value
  containers:
  - name: with-node-affinity
    image: nginx

operator:标签匹配的方式

In:相当于key = value的形式
NotIn:相当于key != value的形式
Exists:节点存在label的key为指定的值即可,不能配置values字段
DoesNotExist:节点不存在label的key为指定的值即可,不能配置values字段
Gt:大于value指定的值
Lt:小于value指定的值

3.pod亲和力的配置

[root@k8s-master01 ~]# vim with-pod-affinity.yaml

apiVersion: v1
kind: Pod
metadata:
  name: with-pod-affinity
spec:
  affinity: 
    podAffinity:  #pod亲和力 
      requiredDuringSchedulingIgnoredDuringExecution: #硬亲和力
      - labelSelector:  #Pod选择器配置,可以配置多个
          matchExpressions:   #可以配置多个key、value类型的选择器(都需要满足)
          - key: security   
            operator: In    #标签匹配的方式
            values:       #可以配置多个(满足其一)
            - S1
        topologyKey: failure-domain.beta.kubernetes.io/zone   #匹配的拓扑域的key,也就是节点上label的key,key和value相同的为同一个域,可以用于标注不同的机房和地区

    podAntiAffinity:  #pod反亲和力 
      preferredDuringSchedulingIgnoredDuringExecution:  #软亲和力
      - weight: 100     #权重,权重越高优先级越大,范围1-100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: security
              operator: In
              values:
              - S2
          namespaces:    #和哪个命名空间的Pod进行匹配,为空为当前命名空间 
          - default
          topologyKey: failure-domain.beta.kubernetes.io/zone
  containers:
  - name: with-pod-affinity
    image: nginx

4.同一个应用部署在不同的宿主机

如下例,有5个副本,配置的是强制反亲和力,假设K8S总共有3个节点,那么会分别在3个节点启动一个pod,剩下2个会一直处于pending状态,并且pod不能和app=must-be-diff-nodes的标签部署在一起。

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: must-be-diff-nodes
  name: must-be-diff-nodes
  namespace: kube-public
spec:
  replicas: 5   #副本数
  selector:
    matchLabels:
      app: must-be-diff-nodes
  template:
    metadata:
      labels:
        app: must-be-diff-nodes
    spec:
      affinity:
        podAntiAffinity:  #反亲和力
          requiredDuringSchedulingIgnoredDuringExecution:  #强制
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - must-be-diff-nodes   #标签
            topologyKey: kubernetes.io/hostname 
      containers:
      - image: nginx
        imagePullPolicy: IfNotPresent
        name: must-be-diff-nodes
4.1同一个应用不同副本固定节点
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-cache
spec:
  selector:
    matchLabels:
      app: store
  replicas: 3
  template:
    metadata:
      labels:
        app: store
    spec:
      nodeSelector:
          app: store
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - store
            topologyKey: "kubernetes.io/hostname"
      containers:
      - name: redis-server
        image: redis:3.2-alpine
4.2应用和缓存尽量部署在同一个域内
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-server
spec:
  selector:
    matchLabels:
      app: web-store
  replicas: 3
  template:
    metadata:
      labels:
        app: web-store
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - web-store
            topologyKey: "kubernetes.io/hostname"  
        podAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - store
              topologyKey: "kubernetes.io/hostname"
      containers:
      - name: web-app
        image: nginx:1.16-alpine

5.尽量调度到高配置服务器

如下图pod尽量配置到ssd=true的标签节点(软亲和,100权重),而且没有GPU=true标签的节点,也可以部署在type=physical标签的节点(权重10).

[root@k8s-master01 ~]# vim nodeAffinitySSD.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: prefer-ssd
  name: prefer-ssd
  namespace: kube-public
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prefer-ssd
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: prefer-ssd
    spec:
      affinity:
        nodeAffinity:   #节点亲和力
          preferredDuringSchedulingIgnoredDuringExecution:   #软亲和力,如果需要强制部署在一个节点可以用requried
          - preference:
              matchExpressions:
              - key: ssd       #ssd标签
                operator: In   #满足
                values:
                - "true"
              - key: GPU     #GPU便标签
                operator: NotIn    #不满足
                values:
                - "true"
            weight: 100    #权重
          - preference:
              matchExpressions:
              - key: type  #type=physical标签
                operator: In
                values:
                - physical
            weight: 10    #权重
      containers:
      - env:
        - name: TZ
          value: Asia/Shanghai
        - name: LANG
          value: C.UTF-8
        image: nginx
        imagePullPolicy: IfNotPresent
        name: prefer-ssd

打标签

[root@k8s-master01 ~]# kubectl get node --show-labels  #查看节点标签
分别给master01,node01节点打上ssd=true的标签,master01节点单独打上GPU=true的标签
[root@k8s-master01 ~]# kubectl label node k8s-master01 ssd=true 
[root@k8s-master01 ~]# kubectl label node k8s-master01 GPU=true 
[root@k8s-master01 ~]# kubectl label node k8s-node01 GPU=true
给node02打上type=physical的标签
[root@k8s-master01 ~]# kubectl label node k8s-node02 type=physical 

创建应用

[root@k8s-master01 ~]# kubectl create -f nodeAffinitySSD.yaml   
[root@k8s-master01 ~]# kubectl get pod -n kube-public  -owide   #发现部署在node01上
NAME                         READY   STATUS    RESTARTS   AGE   IP               NODE         NOMINATED NODE   READINESS GATES
prefer-ssd-dcb88b7d9-wdd54   1/1     Running   0          18s   172.161.125.41   k8s-node01              

假设去掉node1的ssd标签,再重新创建,就会创建到node02打了type标签的节点上。

[root@k8s-master01 ~]# kubectl label node k8s-node01 ssd-   #去掉node01的ssd标签
node/k8s-node01 labeled
[root@k8s-master01 ~]# kubectl delete -f nodeAffinitySSD.yaml #删除
deployment.apps "prefer-ssd" deleted
[root@k8s-master01 ~]# kubectl create -f nodeAffinitySSD.yaml #再创建
deployment.apps/prefer-ssd created
[root@k8s-master01 ~]# kubectl get pod -n kube-public  -owide #查看
NAME                         READY   STATUS    RESTARTS   AGE   IP               NODE         NOMINATED NODE   READINESS GATES
prefer-ssd-dcb88b7d9-58rfw   1/1     Running   0          87s   172.171.14.212   k8s-node02              

6.拓扑域TopologyKey

topologyKey:拓扑域,主要针对宿主机,相当于对宿主机进行区域的划分。用label进行判断,不同的key和不同的value是属于不同的拓扑域
如下图,相同区域可以打相同的一个标签,不同区域打不一样的标签,避免同一个区出现故障,所有pod都部署在同一个区域里面导致服务无法使用。

6.1同一个应用多区域部署

根据上图逻辑上设置3个域标签,把应用pod部署在不同的的区域内
master01,02: region=daxing
master03,node01: region=chaoyang
node02: region=xxx

[root@k8s-master01 ~]# kubectl label node k8s-master01 k8s-master02 region=daxing
[root@k8s-master01 ~]# kubectl label node k8s-node01 k8s-master03 region=chaoyang
[root@k8s-master01 ~]# kubectl label node k8s-node02  region=xxx

创建yaml,设置topologyKey为region,每个pod会部署在不同region上,由于设置是pod强制反亲和力,如果pod副本数超过区域上限数量,剩下的pod就会处于pending状态启动不了。

[root@k8s-master01 ~]# vim must-be-diff-zone.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: must-be-diff-zone
  name: must-be-diff-zone
  namespace: kube-public
spec:
  replicas: 3
  selector:
    matchLabels:
      app: must-be-diff-zone
  template:
    metadata:
      labels:
        app: must-be-diff-zone
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution: #强制反亲和力
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - must-be-diff-zone
            topologyKey: region   #根据上图设置区域标签
      containers:
      - image: nginx
        imagePullPolicy: IfNotPresent
        name: must-be-diff-zone

创建并查看,发现启动的3个pod都在不同的节点上。

[root@k8s-master01 ~]# kubectl create -f must-be-diff-zone.yaml
[root@k8s-master01 ~]# kubectl get pod -n kube-public  -owide
NAME                                 READY   STATUS    RESTARTS   AGE     IP               NODE           NOMINATED NODE   READINESS GATES
must-be-diff-zone-755966bd8b-42fft   1/1     Running   0          2m22s   172.171.14.213   k8s-node02                
must-be-diff-zone-755966bd8b-fx6cs   1/1     Running   0          2m22s   172.169.92.68    k8s-master02              
must-be-diff-zone-755966bd8b-k5d7q   1/1     Running   0          2m22s   172.161.125.42   k8s-node01                
暂无评论

发送评论 编辑评论


				
|´・ω・)ノ
ヾ(≧∇≦*)ゝ
(☆ω☆)
(╯‵□′)╯︵┴─┴
 ̄﹃ ̄
(/ω\)
∠( ᐛ 」∠)_
(๑•̀ㅁ•́ฅ)
→_→
୧(๑•̀⌄•́๑)૭
٩(ˊᗜˋ*)و
(ノ°ο°)ノ
(´இ皿இ`)
⌇●﹏●⌇
(ฅ´ω`ฅ)
(╯°A°)╯︵○○○
φ( ̄∇ ̄o)
ヾ(´・ ・`。)ノ"
( ง ᵒ̌皿ᵒ̌)ง⁼³₌₃
(ó﹏ò。)
Σ(っ °Д °;)っ
( ,,´・ω・)ノ"(´っω・`。)
╮(╯▽╰)╭
o(*////▽////*)q
>﹏<
( ๑´•ω•) "(ㆆᴗㆆ)
😂
😀
😅
😊
🙂
🙃
😌
😍
😘
😜
😝
😏
😒
🙄
😳
😡
😔
😫
😱
😭
💩
👻
🙌
🖕
👍
👫
👬
👭
🌚
🌝
🙈
💊
😶
🙏
🍦
🍉
😣
Source: github.com/k4yt3x/flowerhd
颜文字
Emoji
小恐龙
花!
上一篇
下一篇