api server 无法连接 metrics server 问题解决方案
一、引言
metrics-server 是 Kubernetes 生态中的一个重要组件,其主要的作用在于监测 Kubernetes的(node、pod)资源指标并提供监控系统做采集。Kubernetes 的许多特性都会依赖 metrics server,比如 kubectl top nodes/pods
指令;比如 HPA 需要根据其获取资源利用率;再比如 Istio 的服务组件等。
所以当 metrics-server 出现异常时,相关的组件都会受到影响。比如,如下这种典型的问题:
执行 kubectl top nodes
指令失败
报错信息如下:
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io
其根本原因在于 Kubernetes 的 ApiServer 无法访问到 metrics-server,要验证这种问题,我们可以执行如下命令:
kubectl describe apiservice v1beta1.metrics.k8s.io
返回值如下:
可以看到访问 https://10.244.1.11:4443/apis/metrics.k8s.io/v1beta1
这个 API 异常,其访问的IP是一个典型的 clusterIP。
kubectl get svc -n kube-system metrics-server
我们知道,通常情况下,Kubernetes 的 master 节点是访问不到 clusterIP 的。
而 Kubernetes 的 node 节点则可以访问的到,其主要的原因在于 kube-proxy,那么如果想要 master 节点可以访问到 Service 的 clusterIP,就要在 master 节点上也部署 kube-proxy ,确切的说是将 master 节点同时也作为 node 节点加入集群中去。并且为了避免调度对 master 节点造成影响,还需要对 master 节点打污点处理,这就是这次问题解决方案的主要思路。
二、环境参数
1、相关组件信息
参数 | 版本 |
---|---|
OS | CentOS Linux release 7.9.2009 (Core) |
Docker | v20.10.7 |
Kubernetes | v1.18.18 |
kubectl | v1.18.18 |
etcd | v3.4.14 |
2、Kuernetes 集群信息
角色 | 主机名 | IP |
---|---|---|
master 节点(kubectl) | t-k8sM-001 | 192.168.3.171 |
master 节点 | t-k8sM-002 | 192.168.3.172 |
master 节点 | t-k8sM-003 | 192.168.3.173 |
node 节点 | t-k8sN-001 | 192.168.3.174 |
node 节点 | t-k8sN-002 | 192.168.3.175 |
Nginx Proxy | t-k8s-nginx | 192.168.3.201 |
三、部署Docker
#!/bin/bash
echo "Docker 一键安装开始!"
# 清理残留版本
yum remove -y docker \
docker-client \
docker-client-latest \
docker-common \
docker-latest \
docker-latest-logrotate \
docker-logrotate \
docker-engine && \
# 设置存储库
yum install -y yum-utils \
device-mapper-persistent-data \
lvm2 && \
# 添加 yum 源
wget -O /etc/yum.repos.d/docker-ce.repo https://mirrors.ustc.edu.cn/docker-ce/linux/centos/docker-ce.repo && \
# 修改国内镜像源
sed -i 's#download.docker.com#mirrors.tuna.tsinghua.edu.cn/docker-ce#g' /etc/yum.repos.d/docker-ce.repo && \
# 安装最新版本(也可以指定版本,如 docker-ce-18.09.5-3.el7)
yum install -y docker-ce && \
# 添加开机自启
systemctl enable docker
# 生成 docker 镜像加速地址以及日志清理的配置文件。
if [ ! -d "/etc/docker" ]; then
mkdir /etc/docker
fi
cat > /etc/docker/daemon.json << EOF
{
"registry-mirrors": ["https://5a8zducs.mirror.aliyuncs.com"],
"log-driver":"json-file",
"log-opts": {"max-size":"500m", "max-file":"3"}
}
EOF
# 启动 docker
systemctl start docker
echo "Docker 安装已完成!"
docker version
四、部署 kubelet
1、环境准备
1)创建相关目录
1.创建 Kubernetes 程序目录(可选)
# 终端机操作
ansible t-k8s-1.18 -m file -a "path=/data/application/kubernetes state=directory owner=root group=root"
目录结构说明:
/data/application/kubernetes/
├── bin # 命令程序目录
├── cfg # 配置文件目录
├── logs # 日志目录
└── ssl # 证书目录
2. 创建 kubelet 数据目录
# 终端机操作
ansible t-k8sM-1.18 -m file -a "path=/var/lib/kubelet state=directory owner=root group=root"
2)分发配置文件
1. 分发 bootstrap 认证配置文件
1⃣️ 分发文件: bootstrap.kubeconfig
# 终端机操作
ansible t-k8sM-1.18 -m copy -a "src=/data/softwares/k8s1.18/bootstrap.kubeconfig dest=/data/application/kubernetes/cfg/ owner=root group=root"
文件内容:
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUR2akNDQXFhZ0F3SUJBZ0lVRmIwRWRtMjhRWWx2T2t2Uk1FRWJ1THpBNjh3d0RRWUpLb1pJaHZjTkFRRUwKQlFBd1pURUxNQWtHQTFVRUJoTUNRMDR4RURBT0JnTlZCQWdUQjBKbGFXcHBibWN4RURBT0JnTlZCQWNUQjBKbAphV3BwYm1jeEREQUtCZ05WQkFvVEEyczRjekVQTUEwR0ExVUVDeE1HVTNsemRHVnRNUk13RVFZRFZRUURFd3ByCmRXSmxjbTVsZEdWek1CNFhEVEl4TURVeE1qQTRNRFF3TUZvWERUSTJNRFV4TVRBNE1EUXdNRm93WlRFTE1Ba0cKQTFVRUJoTUNRMDR4RURBT0JnTlZCQWdUQjBKbGFXcHBibWN4RURBT0JnTlZCQWNUQjBKbGFXcHBibWN4RERBSwpCZ05WQkFvVEEyczRjekVQTUEwR0ExVUVDeE1HVTNsemRHVnRNUk13RVFZRFZRUURFd3ByZFdKbGNtNWxkR1Z6Ck1JSUJJakFOQmdrcWhraUc5dzBCQVFFRkFBT0NBUThBTUlJQkNnS0NBUUVBcXBManJTT0Z1U3NvRjVaYlh0dnoKNFZmRFAzOUFtdmtRdWkyZEhVOEpXOFVKT2xhaGlqWDQwaTNNWml5OFQ0K3dVS2Q4M0xRL1ZVOThOOGR6aEJ4awowdlhFY0RMNkh0aDRnWkcwQXZUaU5uNTVhbUdsZHBTOUxYTXNuTFV4amtidzk3UWtlbnNqZVQ0MGdMaW9BUkhzCjkwSGphbGtLS0dBV1hyVjZmT2RLVW9oamxUNXlDdGc3RXNLOG5tOWJVR2NXa09PNWtXSGREZEhqcEc2U3E1RFoKeE5kOU50aHlkZTBiRFlrdUZyUExhSW9tVGNjRk10VGoyOVgzR2Y0OHl1d1Z3b1piaU1ZaU44SDlXSnlzMHdoagorT05BTURmMEN1TFBWSk9HZ0VqTlpvYm1wb2JJazlQbHdGa0RUZFBja0xTN3c4bjNmaGcrb0ZHSXZTWkhkK1E0ClFRSURBUUFCbzJZd1pEQU9CZ05WSFE4QkFmOEVCQU1DQVFZd0VnWURWUjBUQVFIL0JBZ3dCZ0VCL3dJQkFqQWQKQmdOVkhRNEVGZ1FVeWdvWDdlWUhzWmQ2TDZrT3BIcit2ZlhlNDdJd0h3WURWUjBqQkJnd0ZvQVV5Z29YN2VZSApzWmQ2TDZrT3BIcit2ZlhlNDdJd0RRWUpLb1pJaHZjTkFRRUxCUUFEZ2dFQkFFSitGSmZ6b1NuMDRWcGhVNE9KCmFIYUoxNWY0bVdaL2JmTllRdDZLRmVZWEVzV2dlN1ZZMEtWVFBlaVp4WFY2YlZYZFM1TUd5K1RuOUxMSGZpVVUKTy9qdnRGOWJkcHRZQU5HSUtuRFEwWW9zZ24rQVNyb3JHck42RlhqTFlOUDl1aGUxdGlxWmJUOHhVbkRmcXJRZQpoN202TGdpTDNXa0VKRDhPRG5kMSszRjFYZGRubThadWZrWXBxYnNibWpFaGMwalhFNlNoWEg0RGF3TzFzeU51ClVnZi81b1dFSHJTdUJhSlVibjEwWVcrOHI3L1ZQU05GQmI1THR0WktnNzVUMTJtYnZVWFgzejg3RlN3bElwdUwKTDZZUW9xWnc2UWVIZ3I3dG9WeFQ5VkNGYkRTelQ4RDBUUGlQU25abFl1SmhHeEpzeGxUYUgzWGJvUGdINk8rLwpIUmM9Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
server: https://192.168.3.201:6443
name: kubernetes
contexts:
- context:
cluster: kubernetes
user: kubelet-bootstrap
name: default
current-context: default
kind: Config
preferences: {}
users:
- name: kubelet-bootstrap
user:
token: c47ffb939f5ca36231d9e3121a252940
2⃣️ 修改 bootstrap.kubeconfig
# 终端机操作
ansible t-k8sM-1.18 -m shell -a 'sed -i "s/192.168.3.201/$( hostname -I|cut -d " " -f 1)/g" /data/application/kubernetes/cfg/bootstrap.kubeconfig'
2. 分发 kubelet 配置文件
1⃣️ 分发文件: kubelet.conf
# 终端机操作
ansible t-k8sM-1.18 -m copy -a "src=/data/softwares/k8s1.18/kubelet.conf dest=/data/application/kubernetes/cfg/ owner=root group=root"
文件内容:
KUBELET_OPTS="--logtostderr=false \
--v=2 \
--log-dir=/data/application/kubernetes/logs \
--hostname-override=t-k8sN-001 \
--network-plugin=cni \
--kubeconfig=/data/application/kubernetes/cfg/kubelet.kubeconfig \
--bootstrap-kubeconfig=/data/application/kubernetes/cfg/bootstrap.kubeconfig \
--config=/data/application/kubernetes/cfg/kubelet-config.yml \
--cert-dir=/data/application/kubernetes/ssl \
--pod-infra-container-image=lizhenliang/pause-amd64:3.0 \
--node-labels=node.kubernetes.io/k8s-node=true"
文件: kubelet-config.yml
# 终端机操作
ansible t-k8sM-1.18 -m copy -a "src=/data/softwares/k8s1.18/kubelet-config.yml dest=/data/application/kubernetes/cfg/ owner=root group=root"
文件内容:
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
address: 0.0.0.0
port: 10250
readOnlyPort: 10255
cgroupDriver: cgroupfs
clusterDNS:
- 10.0.0.2
clusterDomain: cluster.local
failSwapOn: false
authentication:
anonymous:
enabled: false
webhook:
cacheTTL: 2m0s
enabled: true
x509:
clientCAFile: /data/application/kubernetes/ssl/ca.pem
authorization:
mode: Webhook
webhook:
cacheAuthorizedTTL: 5m0s
cacheUnauthorizedTTL: 30s
evictionHard:
imagefs.available: 15%
memory.available: 100Mi
nodefs.available: 10%
nodefs.inodesFree: 5%
maxOpenFiles: 1000000
maxPods: 110
2⃣️ 修改 kubelet.conf
2、启动服务
1)分发启动文件
1. 分发文件: kubelet.service
# 终端机操作
ansible t-k8sM-1.18 -m copy -a "src=/data/softwares/k8s1.18/kubelet.service dest=/usr/lib/systemd/system/ owner=root group=root mode=0644"
文件内容:
[Unit]
Description=Kubernetes Kubelet
After=docker.service
[Service]
EnvironmentFile=/data/application/kubernetes/cfg/kubelet.conf
ExecStart=/data/application/kubernetes/bin/kubelet $KUBELET_OPTS
Restart=on-failure
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
2)启动服务
1. 重载自定义配置
# 终端机操作
ansible t-k8sM-1.18 -m shell -a "systemctl daemon-reload"
2. 加入开机自启
# 终端机操作
ansible t-k8sM-1.18 -m shell -a "systemctl enable kubelet.service"
3. 启动服务
# 终端机操作
ansible t-k8sM-1.18 -m shell -a "systemctl start kubelet.service"
3)证书授权
1. 查看未授权请求
# 任意 master 节点操作
kubectl get csr
执行成功,返回值类似下方:
NAME AGE SIGNERNAME REQUESTOR CONDITION
node-csr-8nETD46VxpINtCH1gBT1mCNEJGbyHCpSzvVZaUqJxlU 18m kubernetes.io/kube-apiserver-client-kubelet kubelet-bootstrap Pending
2. 授权 TLS
# 任意 master 节点操作
kubectl get csr|grep "Pending"|awk 'NR>0{print $1}'|xargs kubectl certificate approve
执行成功,返回值类似下面:
certificatesigningrequest.certificates.k8s.io/node-csr-8nETD46VxpINtCH1gBT1mCNEJGbyHCpSzvVZaUqJxlU approved
3. 检查就绪状态
1⃣️ 检查 Kubernetes 集群状态
# 任意 master 节点操作
kubectl get nodes
返回如下:
2⃣️ 检查 etcd集群状态
# 任意 master 节点操作
ETCDCTL_API=3 ./bin/etcdctl --cacert="/data/application/etcd/ssl/ca.pem" --cert="/data/application/etcd/ssl/server.pem" --key="/data/application/etcd/ssl/server-key.pem" --endpoints="https://192.168.3.171:2379,https://192.168.3.172:2379,https://192.168.3.173:2379" endpoint health
正常情况,返回类似下面:
https://192.168.3.172:2379 is healthy: successfully committed proposal: took = 22.91948ms
https://192.168.3.171:2379 is healthy: successfully committed proposal: took = 26.761636ms
https://192.168.3.173:2379 is healthy: successfully committed proposal: took = 32.129748ms
4)master 节点设置污点
对三个 master 节点设置 taint:
# 任意 master 节点操作
kubectl taint nodes t-k8sm-001 node-role.kubernetes.io/k8s-master=:NoSchedule
kubectl taint nodes t-k8sm-002 node-role.kubernetes.io/k8s-master=:NoSchedule
kubectl taint nodes t-k8sm-003 node-role.kubernetes.io/k8s-master=:NoSchedule
五、部署 kube-proxy
1、环境准备
1)分发配置文件
1. 分发文件: kube-proxy.conf
# 终端机操作
ansible t-k8sM-1.18 -m copy -a "src=/data/softwares/k8s1.18/kube-proxy.conf dest=/data/application/kubernetes/cfg/ owner=root group=root"
文件内容:
KUBE_PROXY_OPTS="--logtostderr=false \
--v=2 \
--log-dir=/data/application/kubernetes/logs \
--config=/data/application/kubernetes/cfg/kube-proxy-config.yml"
2. 分发文件: kube-proxy-config.yml
# 终端机操作
ansible t-k8sM-1.18 -m copy -a "src=/data/softwares/k8s1.18/kube-proxy-config.yml dest=/data/application/kubernetes/cfg/ owner=root group=root"
文件内容:
kind: KubeProxyConfiguration
apiVersion: kubeproxy.config.k8s.io/v1alpha1
bindAddress: 0.0.0.0
metricsBindAddress: 0.0.0.0:10249
clientConnection:
kubeconfig: /data/application/kubernetes/cfg/kube-proxy.kubeconfig
hostnameOverride: t-k8sN-001
clusterCIDR: 10.0.0.0/16
mode: ipvs
ipvs:
scheduler: "rr"
iptables:
masqueradeAll: true
3. 修改 kube-proxy-config.yml
2、启动服务
1)安装环境依赖
# 终端机操作
ansible t-k8sM-1.18 -m shell -a "yum -y install conntrack"
2)分发启动文件
1. 分发文件: kube-proxy.service
# 终端机操作
ansible t-k8sM-1.18 -m copy -a "src=/data/softwares/k8s1.18/kube-proxy.service dest=/usr/lib/systemd/system/ owner=root group=root mode=0644"
文件内容:
[Unit]
Description=Kubernetes Proxy
After=network.target
[Service]
EnvironmentFile=/data/application/kubernetes/cfg/kube-proxy.conf
ExecStart=/data/application/kubernetes/bin/kube-proxy $KUBE_PROXY_OPTS
Restart=on-failure
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
3)启动服务
1. 重载自定义配置
# 终端机操作
ansible t-k8sM-1.18 -m shell -a "systemctl daemon-reload"
2. 加入开机自启
# 终端机操作
ansible t-k8sM-1.18 -m shell -a "systemctl enable kube-proxy.service"
3. 启动服务
# 终端机操作
ansible t-k8sM-1.18 -m shell -a "systemctl start kube-proxy.service"
六、最终验证
1、验证
1)验证 apiservice
# 任意 master 节点执行
kubectl get apiservice v1beta1.metrics.k8s.io -o yaml
1. 未调整前的异常情况
在未进行调整之前,返回值会包含类似一下的内容,很明显的可以看出连接异常:
2. 调整 kubelet 和 kube-proxy 后
在进行上文的调整后,正常情况下的返回值应该如下,这种情况下 apiserver 可以正常的访问到 metrics-server:
2)验证 kubectl top
1. 查看 node 指标
# 任意 master 节点执行
kubectl top nodes
返回值类似下面:
2. 查看 pod 指标
# 任意 master 节点执行
kubectl top pod -A
返回值类似下面: