Kubernetes资源指标与自定义指标API实践指南
一、背景说明
早期Kubernetes集群依赖Heapster收集资源指标以实现监控功能,但自Kubernetes 1.8版本起,Heapster逐渐被废弃,取而代之的是基于API的资源指标监控体系。该体系包含两条主要流水线:
- 核心指标流水线:由kubelet、metrics-server和API Service组成,提供CPU累计使用率、内存实时使用率、Pod资源占用率及容器磁盘占用率等基础指标。
- 监控流水线:用于收集系统各类指标数据,提供给终端用户、存储系统和HPA(自动扩缩容)。该流水线包含核心指标和非核心指标,其中非核心指标无法被Kubernetes原生解析,需借助其他组件转换。
二、metrics-server部署
2.1 下载与安装
从Kubernetes官方仓库的对应版本分支(如v1.10.0)下载metrics-server所需的YAML文件:
mkdir metrics-server && cd metrics-server
for file in auth-delegator.yaml auth-reader.yaml metrics-apiservice.yaml metrics-server-deployment.yaml metrics-server-service.yaml resource-reader.yaml; do
wget https://raw.githubusercontent.com/kubernetes/kubernetes/v1.10.0/cluster/addons/metrics-server/$file
done
检查镜像并替换为可拉取的镜像源:
grep image: ./*
# 若镜像不可用,手动拉取并修改配置文件
docker pull registry.cn-hangzhou.aliyuncs.com/k8s-kernelsky/metrics-server-amd64:v0.2.1
docker pull registry.cn-hangzhou.aliyuncs.com/criss/addon-resizer:1.8.1
# 修改metrics-server-deployment.yaml中的镜像地址
应用所有YAML文件并检查Pod状态:
kubectl apply -f .
kubectl get pod -n kube-system
2.2 验证功能
# 检查API版本
kubectl api-versions | grep metrics
# 启动代理并测试API
kubectl proxy --port=8080 &
curl http://localhost:8080/apis/metrics.k8s.io/v1beta1
curl http://localhost:8080/apis/metrics.k8s.io/v1beta1/pods
curl http://localhost:8080/apis/metrics.k8s.io/v1beta1/nodes
# 使用kubectl top命令查看资源
kubectl top node
kubectl top pods
2.3 注意事项
在高版本Kubernetes(v1.11及以上)中部署metrics-server时需注意:
- 数据获取端口变更:从默认的10255(HTTP)改为10250(HTTPS),需修改启动参数:
--source=kubernetes.summary_api:https://kubernetes.default?kubeletHttps=true&kubeletPort=10250&insecure=true
- RBAC权限补充:在
resource-reader.yaml的ClusterRole中增加nodes/stats资源:
rules:
- apiGroups: [""]
resources: ["pods", "nodes", "nodes/stats", "namespaces"]
- 针对Kubernetes v1.12.3版本,需调整metrics-server的启动命令:
command:
- /metrics-server
- --metric-resolution=30s
- --kubelet-port=10250
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP
三、Prometheus部署
3.1 获取YAML文件
cd /mnt
git clone https://github.com/kubernetes/kubernetes.git
cd kubernetes/cluster/addons/prometheus
git checkout v1.11.0
cp -r prometheus /root/manifests/
cd /root/manifests/prometheus
3.2 修改配置
将默认的命名空间从kube-system改为自定义空间k8s-monitor:
sed -i 's/namespace: kube-system/namespace: k8s-monitor/g' ./*
创建所需PV,注意storageClassName需与StatefulSet定义一致:
cat << EOF | kubectl apply -f -
apiVersion: v1
kind: PersistentVolume
metadata:
name: alertmanager
spec:
capacity:
storage: 5Gi
accessModes: ["ReadWriteOnce", "ReadWriteMany"]
persistentVolumeReclaimPolicy: Recycle
nfs:
path: /data/volumes/v1
server: 172.16.150.158
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: standard
spec:
capacity:
storage: 25Gi
accessModes: ["ReadWriteOnce"]
persistentVolumeReclaimPolicy: Recycle
storageClassName: standard
nfs:
path: /data/volumes/v2
server: 172.16.150.158
EOF
kubectl create namespace k8s-monitor
将组件文件分类整理:
mkdir node-exporter kube-state-metrics alertmanager prometheus
mv node-exporter-* node-exporter
mv alertmanager-* alertmanager
mv kube-state-metrics-* kube-state-metrics
mv prometheus-* prometheus
3.3 安装node-exporter
kubectl apply -f node-exporter/
kubectl get pod -n k8s-monitor
3.4 安装Prometheus
kubectl apply -f pv.yaml
kubectl get pv
# 修改prometheus-service.yaml,将类型改为NodePort以外部访问
type: NodePort
ports:
- name: http
port: 9090
nodePort: 30090
kubectl apply -f prometheus/
kubectl get pod -n k8s-monitor
kubectl get svc -n k8s-monitor
访问Prometheus UI:http://节点IP:30090
3.5 部署metrics适配器
# 检查镜像
grep image: ./*
# 拉取替代镜像并修改配置
docker pull registry.cn-hangzhou.aliyuncs.com/ccgg/addon-resizer:1.7
kubectl apply -f kube-state-metrics-deployment.yaml
kubectl get pod -n k8s-monitor
3.6 部署k8s-prometheus-adapter
生成HTTPS证书并创建Secret:
cd /etc/kubernetes/pki
(umask 077; openssl genrsa -out serving.key 2048)
openssl req -new -key serving.key -out serving.csr -subj "/CN=serving"
openssl x509 -req -in serving.csr -CA ./ca.crt -CAkey ./ca.key -CAcreateserial -out serving.crt -days 3650
kubectl create secret generic cm-adapter-serving-certs \
--from-file=serving.crt=./serving.crt \
--from-file=serving.key=./serving.key \
-n k8s-monitor
克隆适配器项目并部署:
git clone https://github.com/DirectXMan12/k8s-prometheus-adapter.git
cd k8s-prometheus-adapter/deploy/manifests
# 修改命名空间(排除rolebinding文件)
sed -i 's/namespace: custom-metrics/namespace: k8s-monitor/g' ./*
kubectl apply -f ./
kubectl get pod -n k8s-monitor
kubectl get svc -n k8s-monitor
kubectl api-versions | grep custom
四、Grafana可视化集成
4.1 部署Grafana
wget https://raw.githubusercontent.com/kubernetes-retired/heapster/master/deploy/kube-config/influxdb/grafana.yaml
# 修改grafana.yaml:注释influxdb环境变量,修改namespace和端口类型为NodePort
kubectl apply -f grafana.yaml
kubectl get svc -n k8s-monitor
kubectl get pod -n k8s-monitor
4.2 配置数据源
登录Grafana后,配置Prometheus数据源:
- URL:
http://prometheus.k8s-monitor.svc:9090 - Access:Browser
4.3 导入Dashboard模板
从Grafana官网下载Kubernetes相关模板(如"Kubernetes Cluster (Prometheus)"),通过JSON文件导入。
五、HPA自动扩缩容实现
5.1 基于v1版本的HPA
创建测试Deployment:
kubectl autoscale deployment myapp-deploy --min=1 --max=8 --cpu-percent=60
# 压力测试
ab -c 1000 -n 5000000 http://172.16.150.213:32222/index.html
kubectl describe hpa myapp-deploy
kubectl get hpa
5.2 基于v2beta1版本的HPA
cat << EOF | kubectl apply -f -
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa-v2
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp-deploy
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 55
- type: Resource
resource:
name: memory
targetAverageValue: 100Mi
EOF
kubectl delete hpa myapp-deploy
kubectl apply -f hpa-demo.yaml
kubectl get hpa
kubectl describe hpa myapp-hpa-v2
5.3 自定义指标HPA
使用带自定义指标的镜像:
cat << EOF | kubectl apply -f -
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: myapp-hpa-v2
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp-deploy
minReplicas: 1
maxReplicas: 10
metrics:
- type: Pods
pods:
metricName: http_requests
targetAverageValue: 800m
EOF
kubectl apply -f hpa-custom.yaml
kubectl describe hpa myapp-hpa-v2
