一.架构解析
二.Prometheus的安装
安装方式:1.二进制安装 2.容器安装 3.helm安装 4.operator-Prometheus 5.kube-Premetheus Stack
1.helm安装使用
1.安装heml
# helm3 下载地址 https://github.com/helm/helm/releases # 下载后解压获取二进制文件即可使用 # 添加 bitnami chart仓库 helm repo add bitnami https://charts.bitnami.com/bitnami # 更新仓库 helm repo update
2.安装prometheus
# 安装 prometheus helm install prometheus bitnami/prometheus-operator \ --set prometheus.service.type=NodePort \ --set prometheus.service.nodePort=30090 # 访问 prometheus curl http://192.168.0.236:30090
3.安装Grafana
# 安装 grafana helm install grafana bitnami/grafana \ --set persistence.enabled=false \ --set service.type=NodePort \ # 获取 grafana 用户名密码 echo "User: admin" && echo "Password: $(kubectl get secret grafana-admin -n default -o jsonpath="{.data.GF_SECURITY_ADMIN_PASSWORD}" | base64 --decode)"
查看SVC和pod:
[root@k8s-master01 ~]# kubectl get svc NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE alertmanager-operated ClusterIP None9093/TCP,9094/TCP,9094/UDP 16m grafana NodePort 10.111.218.22 3000:32069/TCP 14m kubernetes ClusterIP 10.96.0.1 443/TCP 26d prometheus-kube-state-metrics ClusterIP 10.100.139.57 8080/TCP 17m prometheus-node-exporter ClusterIP 10.109.82.168 9100/TCP 17m prometheus-operated ClusterIP None 9090/TCP 16m prometheus-prometheus-oper-alertmanager ClusterIP 10.108.28.115 9093/TCP 17m prometheus-prometheus-oper-operator ClusterIP 10.109.75.165 8080/TCP 17m prometheus-prometheus-oper-prometheus NodePort 10.104.90.86 9090:30090/TCP 17m [root@k8s-master01 ~]# kubectl get pod NAME READY STATUS RESTARTS AGE alertmanager-prometheus-prometheus-oper-alertmanager-0 2/2 Running 0 29m grafana-f4c5594cf-zsng2 1/1 Running 0 27m prometheus-kube-state-metrics-bbd98b855-g8mqb 1/1 Running 0 30m prometheus-node-exporter-727f8 1/1 Running 0 30m prometheus-node-exporter-f2gbt 1/1 Running 0 30m prometheus-node-exporter-fzzpd 1/1 Running 0 30m prometheus-node-exporter-mdjdv 1/1 Running 0 30m prometheus-node-exporter-wf8kq 1/1 Running 0 30m prometheus-prometheus-oper-operator-7c98ddcd74-92t5b 1/1 Running 0 30m prometheus-prometheus-prometheus-oper-prometheus-0 3/3 Running 1 29m
# 访问 grafana
curl http://192.168.0.236:32069
4.登录Grafana集成Prometheus并添加Dashboards
模板地址:https://grafana.com/grafana/dashboards/
挑选其中一个模板ID:13105,8919
最后查看效果:
三.servicemonitor解析配置
1.servicemonitor概念
简单概述:servicemonitor是用来配置监控目标的。
传统二进制和容器安装的Prometheus会用Prometheus.yaml来管理监控项,当监控项很多时,难以维护。所以官方添加了servicemonitor的kind,通过servicemonitor来管理监控项。helm和operator-Prometheus安装方式都可以通过servicemonitor来管理监控项。
如下,servicemonitor注册了node-exporter的监控项,当node-exporter发现监控目标时,会把监控信息转化成Prometheus.yaml从而加载到Prometheus中。
[root@k8s-master01 ~]# kubectl get servicemonitor NAME AGE prometheus-kube-state-metrics 17h prometheus-node-exporter 17h prometheus-prometheus-oper-alertmanager 17h prometheus-prometheus-oper-apiserver 17h prometheus-prometheus-oper-kube-proxy 17h prometheus-prometheus-oper-kubelet 17h prometheus-prometheus-oper-operator 17h prometheus-prometheus-oper-prometheus 17h [root@k8s-master01 ~]# kubectl get servicemonitor prometheus-node-exporter -oyaml apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: annotations: meta.helm.sh/release-name: prometheus meta.helm.sh/release-namespace: default creationTimestamp: "2021-12-18T09:23:43Z" generation: 1 labels: app.kubernetes.io/instance: prometheus app.kubernetes.io/managed-by: Helm app.kubernetes.io/name: node-exporter app.kubernetes.io/version: 1.0.1 helm.sh/chart: node-exporter-1.1.0 managedFields: - apiVersion: monitoring.coreos.com/v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: .: {} f:meta.helm.sh/release-name: {} f:meta.helm.sh/release-namespace: {} f:labels: .: {} f:app.kubernetes.io/instance: {} f:app.kubernetes.io/managed-by: {} f:app.kubernetes.io/name: {} f:app.kubernetes.io/version: {} f:helm.sh/chart: {} f:spec: .: {} f:endpoints: {} f:jobLabel: {} f:namespaceSelector: .: {} f:matchNames: {} f:selector: .: {} f:matchLabels: .: {} f:app.kubernetes.io/instance: {} f:app.kubernetes.io/name: {} manager: helm operation: Update time: "2021-12-18T09:23:43Z" name: prometheus-node-exporter namespace: default resourceVersion: "6180" uid: 2d35c5bc-d591-4c89-9891-e51823eee553 spec: endpoints: - port: metrics jobLabel: jobLabel namespaceSelector: matchNames: - default selector: #通过selector来发现目标 matchLabels: app.kubernetes.io/instance: prometheus app.kubernetes.io/name: node-exporter [root@k8s-master01 ~]# kubectl get svc -l app.kubernetes.io/instance=prometheus #上述selector匹配的SVC NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE prometheus-kube-state-metrics ClusterIP 10.100.139.578080/TCP 17h prometheus-node-exporter ClusterIP 10.109.82.168 9100/TCP 17h prometheus-prometheus-oper-alertmanager ClusterIP 10.108.28.115 9093/TCP 17h prometheus-prometheus-oper-operator ClusterIP 10.109.75.165 8080/TCP 17h prometheus-prometheus-oper-prometheus NodePort 10.104.90.86 9090:30090/TCP 17h [root@k8s-master01 ~]# kubectl get ep prometheus-node-exporter #ep发现了很多主机 NAME ENDPOINTS AGE prometheus-node-exporter 192.168.0.100:9100,192.168.0.101:9100,192.168.0.102:9100 + 2 more... 17h
2.servicemonitor配置解析
3.云原生和非云原生的监控流程
云原生:应用本身提供的监控数据。
非云原生:export收集提供的数据。
四.云原生监控
1.Etcd监控
访问测试:直接http访问本机2379端口会报错,需要加上证书,由于本机集群的etcd证书位置: /etc/kubernetes/pki/etcd/ca.crt,所以用命令加上证书位置参数访问,能获取数据表示成功。
[root@k8s-master01 etcd]# curl -s --cert /etc/kubernetes/pki/etcd/ca.crt --key /etc/kubernetes/pki/etcd/ca.key https://127.0.0.1:2379/metrics -k | tail -1 promhttp_metric_handler_requests_total{code="503"} 0
(1).Etcd-Service的创建
创建etcd的svc和Endpoints
apiVersion: v1 kind: Endpoints metadata: labels: app: etcd-prom name: etcd-prom namespace: kube-system subsets: - addresses: - ip: 192.168.0.100 # master节点etcd的主机IP - ip: 192.168.0.101 - ip: 192.168.0.102 ports: - name: https-metrics port: 2379 # etcd端口 protocol: TCP --- apiVersion: v1 kind: Service metadata: labels: app: etcd-prom name: etcd-prom namespace: kube-system spec: ports: - name: https-metrics port: 2379 protocol: TCP targetPort: 2379 type: ClusterIP [root@k8s-master01 ~]# kubectl create -f etcd-svc_ep.yaml endpoints/etcd-prom created service/etcd-prom created [root@k8s-master01 ~]# kubectl get svc -n kube-system etcd-prom NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE etcd-prom ClusterIP 10.111.99.242379/TCP 31s [root@k8s-master01 ~]# kubectl get ep -n kube-system etcd-prom NAME ENDPOINTS AGE etcd-prom 192.168.0.100:2379,192.168.0.101:2379,192.168.0.102:2379 100s
切换成etcd的svc-ip测试效果。
[root@k8s-master01 ~]# curl -s --cert /etc/kubernetes/pki/apiserver-etcd-client.crt / --key /etc/kubernetes/pki/apiserver-etcd-client.key / https://10.111.99.24:2379/metrics -k | tail -1 promhttp_metric_handler_requests_total{code="503"} 0
由于etcd需要证书访问,所以需要把证书挂载到pod里面
包含一个k8s的ca证书,2个etcd的证书,改好后修改prometheus配置文件,把secrets配置到上面:
[root@k8s-master01 pki]# kubectl create secret generic etcd-ssl --from-file=/etc/kubernetes/pki/ca.crt / --from-file=/etc/kubernetes/pki/apiserver-etcd-client.crt / --from-file=/etc/kubernetes/pki/apiserver-etcd-client.key secret/etcd-ssl created [root@k8s-master01 pki]# kubectl get prometheus NAME VERSION REPLICAS AGE prometheus-prometheus-oper-prometheus 1 21h [root@k8s-master01 pki]# kubectl edit prometheus prometheus-prometheus-oper-prometheus ... #加上 secrets: - etcd-ssl ... prometheus.monitoring.coreos.com/prometheus-prometheus-oper-prometheus edited
查看已挂载上了
[root@k8s-master01 pki]# kubectl exec prometheus-prometheus-prometheus-oper-prometheus-0 -c prometheus / -- ls /etc/prometheus/secrets/etcd-ssl/ #查看已挂载 apiserver-etcd-client.crt apiserver-etcd-client.key ca.crt
(2).Etcd-ServiceMonitor的创建
创建Etcd的ServiceMonitor:
[root@k8s-master01 ~]# vim servicemonitor.yaml apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: etcd namespace: defluat #与prometheus命名空间一致 labels: app: etcd spec: jobLabel: k8s-app endpoints: - interval: 30s # 监控间隔时间 port: https-metrics # 这个port对应 Service.spec.ports.name scheme: https #协议 tlsConfig: caFile: /etc/prometheus/secrets/etcd-ssl/ca.crt #证书路径 certFile: /etc/prometheus/secrets/etcd-ssl/apiserver-etcd-client.crt keyFile: /etc/prometheus/secrets/etcd-ssl/apiserver-etcd-client.key insecureSkipVerify: true # 关闭证书校验 selector: matchLabels: app: etcd-prom # 跟svc的lables保持一致,下述查看 namespaceSelector: matchNames: - kube-system 如下查看etcd-prom标签的svc [root@k8s-master01 pki]# kubectl get svc -n kube-system -l app=etcd-prom NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE etcd-prom ClusterIP 10.111.99.242379/TCP 3h18m
最后创建servicemonitor:
[root@k8s-master01 ~]# kubectl create -f servicemonitor.yaml servicemonitor.monitoring.coreos.com/etcd created [root@k8s-master01 ~]# kubectl get servicemonitor etcd NAME AGE etcd 28s
然后可以在prometheus-web端或者grafana查看验证(grafana-etcd模板ID:3070)
五.非云原生监控Exporter
1.mysql监控
(1).部署mysql,设置账号授权
简易创建mysql并设置密码,然后再暴露端口(svc):
[root@k8s-master01 ~]# kubectl create deploy mysql --image=registry.cn-beijing.aliyuncs.com/dotbalo/mysql:5.7.23 [root@k8s-master01 ~]# kubectl set env deploy/mysql MYSQL_ROOT_PASSWORD=mysql [root@k8s-master01 ~]# kubectl expose deploy mysql --port 3306
验证:
[root@k8s-master01 ~]# kubectl get svc | grep mysql mysql ClusterIP 10.98.1.23306/TCP 86s [root@k8s-master01 ~]# kubectl get po -l app=mysql NAME READY STATUS RESTARTS AGE mysql-69d6f69557-g8455 1/1 Running 0 5m31s [root@k8s-master01 ~]# telnet 10.98.1.2 3306 Trying 10.98.1.2... Connected to 10.98.1.2. Escape character is '^]'. J
进入容器,登录mysql,设置账号(用户名密码:exporter),并授权查看监控数据:
[root@k8s-master01 ~]# kubectl exec -it mysql-69d6f69557-g8455 -- bash root@mysql-69d6f69557-g8455:/# mysql -uroot -pmysql mysql: [Warning] Using a password on the command line interface can be insecure. Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 4 Server version: 5.7.23 MySQL Community Server (GPL) Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> CREATE USER 'exporter'@'%' IDENTIFIED BY 'exporter' WITH -> MAX_USER_CONNECTIONS 3; Query OK, 0 rows affected (0.00 sec) mysql> GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO -> 'exporter'@'%'; Query OK, 0 rows affected (0.00 sec)
(2).配置Exporter采集
创建SVC和mysql-export:
[root@k8s-master01 ~]# vim mysql-exporter.yaml kind: Deployment metadata: name: mysql-exporter namespace: default spec: replicas: 1 selector: matchLabels: k8s-app: mysql-exporter template: metadata: labels: k8s-app: mysql-exporter spec: containers: - name: mysql-exporter image: registry.cn-beijing.aliyuncs.com/dotbalo/mysqld-exporter # 需要自行在网站上寻找同步到私有仓库 env: - name: DATA_SOURCE_NAME value: "exporter:exporter@(mysql.default:3306)/" # exporter:exporter为用户名:密码,mysql.default:3306为pvc访问路径,外部地址可以IP形式:1.1.1.1:3306 imagePullPolicy: IfNotPresent ports: - containerPort: 9104 --- apiVersion: v1 kind: Service metadata: name: mysql-exporter namespace: default labels: k8s-app: mysql-exporter spec: type: ClusterIP selector: k8s-app: mysql-exporter ports: - name: api port: 9104 protocol: TCP
创建:
[root@k8s-master01 ~]# kubectl create -f mysql-exporter.yaml [root@k8s-master01 ~]# kubectl get -f mysql-exporter.yaml NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/mysql-exporter 1/1 1 1 5m20s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/mysql-exporter ClusterIP 10.106.213.178
验证:有数据
[root@k8s-master01 ~]# curl 10.106.213.178:9104/metrics | tail -1 promhttp_metric_handler_requests_total{code="503"} 0
(3).ServiceMonitor和Grafana配置
配置ServiceMonitor: vim mysql-sm.yaml apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: mysql-exporter namespace: default labels: k8s-app: mysql-exporter namespace: default spec: jobLabel: k8s-app endpoints: - port: api interval: 30s scheme: http selector: matchLabels: k8s-app: mysql-exporter namespaceSelector: matchNames: - default
创建:
[root@k8s-master01 ~]# kubectl create -f mysql-sm.yaml servicemonitor.monitoring.coreos.com/mysql-exporter created
验证:
dashboard地址ID: https://grafana.com/grafana/dashboards/6239
(4).Export查找
可以在github上寻找,需要和dashboards对应起来,最好找有dashboard提供的Export.
图片后补充。。。
六.黑核监控和静态配置
1.黑核监控blackbox-exporter
黑核监控:监控现象正在发生的问题,例如:网站延迟/端口响应时间/访问速度等。
白核监控:程序内部的指标,例如CPU/内存/连接数。
helm安装blackbox-exporter
[root@k8s-master01 ~]# helm search repo Blackbox NAME CHART VERSION APP VERSION DESCRIPTION stable/prometheus-blackbox-exporter 4.3.1 0.16.0 DEPRECATED Prometheus Blackbox Exporter [root@k8s-master01 ~]# helm pull stable/prometheus-blackbox-exporter [root@k8s-master01 ~]# cd prometheus-blackbox-exporter/ [root@k8s-master01 prometheus-blackbox-exporter]# helm install blackbox . [root@k8s-master01 prometheus-blackbox-exporter]# kubectl get svc | grep black #查看创建的SVC blackbox-prometheus-blackbox-exporter ClusterIP 10.104.44.2179115/TCP 17m [root@k8s-master01 ~]# curl -s "http://10.104.44.217:9115/probe?target=www.baidu.com&module=http_2xx" | tail -1 #验证 probe_success 1
2.prometheus静态配置_监控外部网站
首先创建一个空文件,然后通过该文件创建一个 Secret,那么这个 Secret 即可作为Prometheus 的静态配置:
[root@k8s-master01 ~]# touch prometheus-additional.yaml [root@k8s-master01 ~]# kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml secret/additional-configs created
创建完 Secret 后,需要编辑下 Prometheus 配置:
[root@k8s-master01 ~]# kubectl edit prometheus prometheus-prometheus-oper-prometheus #加上spec.additionalScrapeConfigs additionalScrapeConfigs: key: prometheus-additional.yaml name: additional-configs optional: true
添加上述配置后保存退出,无需重启 Prometheus 的 Pod 即可生效。之后在 prometheusadditional.yaml 文件内编辑一些静态配置,此处用黑盒监控的配置进行演示:
[root@k8s-master01 ~]# vim prometheus-additional.yaml - job_name: 'blackbox' metrics_path: /probe params: module: [http_2xx] # Look for a HTTP 200 response. static_configs: - targets: - http://gaoxin.kubeasy.com # Target to probe with http. - https://www.baidu.com # Target to probe with https. - https://liequ.wtoipcdn.com relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: blackbox-prometheus-blackbox-exporter:9115 # The blackbox exporter's real hostname:port. ➢ targets:探测的目标,根据实际情况进行更改 ➢ params:使用哪个模块进行探测 ➢ replacement:Blackbox Exporter 的地址
可以看到此处的内容,和传统配置的内容一致,只需要添加对应的 job 即可。之后通过该文
件更新该 Secret:
[root@k8s-master01 ~]# kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml --dry-run=client -oyaml | kubectl replace -f - secret/additional-configs replaced
然后查看验证:dashboard地址ID: https://grafana.com/grafana/dashboards/13659
3.Prometheus监控Windows(外部)主机
监控 Linux 的 Exporter 是:https://github.com/prometheus/node_exporter,监控 Windows
主机的 Exporter 是:https://github.com/prometheus-community/windows_exporter。
首 先 下 载 对 应 的 Exporter 至 Windows 主 机 ( MSI 文 件 下 载 地 址 :
https://github.com/prometheus-community/windows_exporter/releases):
下载完成后,双击打开即可完成安装,之后可以在任务管理器上看到对应的进程:
然后导入dashboard地址ID:https://grafana.com/grafana/dashboards/12566