> 技术文档 > 【实战排障】K8s三则报错解决:节点加入失败、APIServer连接异常、资源获取错误_[preflight] running pre-flight checks error execut

【实战排障】K8s三则报错解决:节点加入失败、APIServer连接异常、资源获取错误_[preflight] running pre-flight checks error execut


问题一、node节点加入集群失败(残留文件/端口占用)

构建三节点k8s集群过程中(已完成初始化),其余node节点加入k8s集群时报错

1、报错片段

[root@node01 docker]# kubeadm join 11.0.1.173:6443 --token dj2kc8.h3h0ep3jkmtry355 \\ --discovery-token-ca-cert-hash sha256:7f01426026c45d3dcd60e67f825356d4896e681de0a08f7846322eb01f4e9b76 [preflight] Running pre-flight checkserror execution phase preflight: [preflight] Some fatal errors occurred:[ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists[ERROR Port-10250]: Port 10250 is in use[ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`

2、根因分析

通过报错信息\"[ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
    [ERROR Port-10250]: Port 10250 is in use
    [ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
\",我们可以得出当前节点存在之前k8s安装的残留文件,因此加入集群失败。

3、解决思路

(1)清理残留

#1、重置kubeadm(如果之前初始化过)sudo kubeadm reset --forcesystemctl status kubelet #确认未残留旧服务#2、手动删除残留文件sudo rm -rf /etc/kubernetes/*sudo rm -rf /var/lib/kubelet/*sudo rm -rf /var/lib/etcd/*sudo rm -rf /var/lib/etcd/*sudo rm -rf ~/.kube/#3、清理网络接口(Flannel/calico)sudo ip link delete cni0sudo ip link delete flannel.1 2>/dev/null#4、释放被占用的端口(10250)sudo kill -9 $(sudo lsof -i :10250 -t)

(2)重新加入集群

sudo kubeadm join 11.0.1.173:6443 \\ --token dj2kc8.h3h0ep3jkmtry355 \\ --discovery-token-ca-cert-hash sha256:7f01426026c45d3dcd60e67f825356d4896e681de0a08f7846322eb01f4e9b76 \\ --ignore-preflight-errors=Port-10250 # 仅在确认端口冲突可忽略时使用

(3)注意事项

-------token失效(24h有效)

#在master节点重新生成有效tokenkubeadm token create --print-join-command

-------网络插件( 需确保running)

#查看网络插件状态kubectl get pods -n kube-system

-------时间同步、防火墙、docker/containerd

#所有节点时间必须同步sudo timedatectl set-ntp true#防火墙端口开放sudo firewalld-cmd --add-port={6443,10250,2379,2380}/tcp --permanentsudo firewalld-cmd --reload#docker/containerdsudo systemctl restart docker containerd

按照以上步骤排查处理node节点加入集群报错问题,若仍失败,请结合\"journalctl -u kubelet\"看日志输出结果继续进行具体排查。

问题二、APIServer连接异常(kubeconfig/DNS问题)

在node节点使用\"kubectl get nodes\"命令时出现报错。

1、报错片段

[root@node01 docker]# kubectl get noedsE0527 15:38:23.832617 126941 memcache.go:265] couldn\'t get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp: lookup localhost on 8.8.8.8:53: no such hostE0527 15:38:23.882251 126941 memcache.go:265] couldn\'t get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp: lookup localhost on 8.8.8.8:53: no such hostE0527 15:38:23.934448 126941 memcache.go:265] couldn\'t get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp: lookup localhost on 8.8.8.8:53: no such hostE0527 15:38:23.986633 126941 memcache.go:265] couldn\'t get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp: lookup localhost on 8.8.8.8:53: no such hostE0527 15:38:24.053044 126941 memcache.go:265] couldn\'t get current server API group list: Get \"http://localhost:8080/api?timeout=32s\": dial tcp: lookup localhost on 8.8.8.8:53: no such hostUnable to connect to the server: dial tcp: lookup localhost on 8.8.8.8:53: no such host

2、根因分析

这个报错显示kubectl未正确连接到Api Server,尝试默认连接http://localhost:8080(没配置kubeconfig或环境变量),且查不到localhost域名(DNS解析失败)。

3、解决思路

(1)确认kubectl配置文件有无,或默认路径是否存在(如果文件不存在或内容错误,kubectl会默认尝试连接localhost:8080,失败是正常的)

[root@node01 ~]# echo $KUBECONFIG/root/.kube/config[root@node01 ~]# ls ~/.kube/configls: 无法访问 \'/root/.kube/config\': 没有那个文件或目录

这时,我们可以用这几条命令进行修复。

mkdir -p ~/.kubecp -i /etc/kubernetes/admin.conf ~/.kube/configchown $(id -u):$(id -g) ~/.kube/config

我这里直接手动创建,有config这个文件之后就能通过\"kubectl get nodes\"正常看到各个节点的状态了。

(2)根据\"lookup localhost on 8.8.8.8:53: no such host\"的报错,我们查看/etc/hosts文件,确认/etc/hosts文件里是否有localhost。

#检查cat /etc/hosts |grep localhost#如果没有echo \"127.0.0.1 localhost\" >> /etc/hosts#重试kubectl get nodes

(3)代理

如果你配置了代理http_proxy和https_proxy,kubectl、DNS也可能会异常。

#排查env |grep -i proxy#尝试临时取消unset http_proxyunset https_proxyunset HTTP_PROXYunset HTTPS_PROXY#重试kubectl get nodes

问题三、资源获取失败(配置错误/版本兼容)

1、报错片段

[root@node02 ~]# kubectl get nodesE0527 21:56:54.255182 614678 memcache.go:265] couldn\'t get current server API group list: the server could not find the requested resourceE0527 21:56:54.255898 614678 memcache.go:265] couldn\'t get current server API group list: the server could not find the requested resourceE0527 21:56:54.258469 614678 memcache.go:265] couldn\'t get current server API group list: the server could not find the requested resourceE0527 21:56:54.259326 614678 memcache.go:265] couldn\'t get current server API group list: the server could not find the requested resourceE0527 21:56:54.261615 614678 memcache.go:265] couldn\'t get current server API group list: the server could not find the requested resourceError from server (NotFound): the server could not find the requested resource

2、根因分析

此时kubectl可以连接Api Server,但尝试获取资源时,服务器找不到资源。原因可能是kubeconfig文件指向的Api Server地址错误(当前处于非master节点)或者kubectl与Api Server版本不兼容

3、解决思路

复制kubeconfig并修改地址

#在master节点执行scp ~/.kube/config node02:/root/.kube/config

然后在对应node节点修改~/.kube/config文件

#找到这一段server: https://127.0.0.1:6443#改成下面这一段server: https://master节点的ip:6443#检查kubectl get nodes

上面是kubeconfig文件的问题,如果是版本兼容的问题,建议针对性的进行降级或升级。

总结

以上三类问题覆盖了K8s节点部署中的常见痛点,核心思路是:清理残留→检查配置→验证连通性。建议结合日志(journalctl -u kubelet)和集群状态(kubectl get pods -A)综合排查。

> 本文部分内容由 AI 辅助生成,结合本人实践整理而成,欢迎指正与交流。