1 您需要了解

虚拟化平台版本 VMware Workstation 17.0.0（备注：扩展）
操作系统版本 CentOS Linux Stream 8 ，可参考下方相关文章 G003 完成系统安装，步骤一致
安装源您可访问 CentOS官网 / 阿里永久镜像站 / 华为永久镜像站或其他镜像站进行下载
环境用到 3台 虚拟机，单网卡，NAT 模式，并为其配置 静态 IP 及 DNS，具体规划请参考 第 2 章 环境规划
查看 ceph 官方文档 Ceph Documentation，截至发稿前，最新版本为 reef（18.1.1），它是一个开发测试版本
为有更好的浏览体验，您可以点击文章左上方目录按钮来显示文章整体目录结构

相关文章
G003-OS-LIN-RHEL-01 红帽 8.4 安装

2 环境规划

主机名	IP	网关/DNS	CPU/内存	系统盘	存储盘	角色	备注
ceph1	192.168.100.201	192.168.100.2	4c8g	100G	30G	集群引导节点	/
ceph2	192.168.100.202	192.168.100.2	4c8g	100G	30G	集群主机	/
ceph3	192.168.100.203	192.168.100.2	4c8g	100G	30G	集群主机	/

3 系统环境配置

3.1 IP及主机名映射（所有节点）

[root@ceph1 ~]# echo '192.168.100.201 ceph1' >> /etc/hosts
[root@ceph1 ~]# echo '192.168.100.202 ceph2' >> /etc/hosts
[root@ceph1 ~]# echo '192.168.100.203 ceph3' >> /etc/hosts
[root@ceph1 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.100.201 ceph1
192.168.100.202 ceph2
192.168.100.203 ceph3

[root@ceph1 ~]# scp /etc/hosts ceph2:/etc/
[root@ceph1 ~]# scp /etc/hosts ceph3:/etc/

3.2 互信设置（仅ceph1节点）

未来 ceph1 作为引导节点，设置 ceph1 对 ceph2 及 ceph3 免密登录

[root@ceph1 ~]# cd .ssh/
[root@ceph1 .ssh]# ls
known_hosts
[root@ceph1 .ssh]# ssh-keygen -N ""
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:aWIoljYu3uwSXFSB07HNtxILq1gt+ZmT+v25OVyAeWs root@ceph1
The key's randomart image is:
+---[RSA 3072]----+
|    ++o          |
|   + .+          |
|  . .o +o.       |
|   oo.oo++.      |
|. B+.oooS.o      |
| *oo+.+o.E .     |
|..o. *  o .      |
|.oo . o  oo      |
| .o=.. ..=o      |
+----[SHA256]-----+

[root@ceph1 .ssh]# ssh-copy-id ceph1
[root@ceph1 .ssh]# ssh-copy-id ceph2
[root@ceph1 .ssh]# ssh-copy-id ceph3

3.3 关闭防火墙及SELinux（所有节点）

[root@ceph1 .ssh]# systemctl stop firewalld
[root@ceph1 .ssh]# systemctl disable firewalld
Removed /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
[root@ceph1 .ssh]# setenforce 0
[root@ceph1 .ssh]# sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config

[root@ceph2 ~]# systemctl stop firewalld
[root@ceph2 ~]# systemctl disable firewalld
Removed /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
[root@ceph2 ~]# setenforce 0
[root@ceph2 ~]# sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config

[root@ceph3 ~]# systemctl stop firewalld
[root@ceph3 ~]# systemctl disable firewalld
Removed /etc/systemd/system/multi-user.target.wants/firewalld.service.
Removed /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
[root@ceph3 ~]# setenforce 0
[root@ceph3 ~]# sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config

3.4 配置YUM源（所有节点）

***配置基础源***

mkdir /etc/yum.repos.d/bak
mv /etc/yum.repos.d/*.repo /etc/yum.repos.d/bak/

cat <<EOF > /etc/yum.repos.d/cloudcs.repo
[ceph]
name=ceph
baseurl=https://mirrors.aliyun.com/ceph/rpm-18.1.1/el8/x86_64/
gpgkey=https://mirrors.aliyun.com/ceph/keys/release.asc
gpgcheck=1
enabled=1

[ceph-noarch]
name=ceph-noarch
baseurl=https://mirrors.aliyun.com/ceph/rpm-18.1.1/el8/noarch/
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/ceph/keys/release.asc
enabled=1

[ceph-SRPMS]
name=SRPMS
baseurl=https://mirrors.aliyun.com/ceph/rpm-18.1.1/el8/SRPMS/
gpgcheck=1
gpgkey=https://mirrors.aliyun.com/ceph/keys/release.asc
enabled=1

[highavailability]
name=CentOS Stream 8 - HighAvailability
baseurl=https://mirrors.aliyun.com/centos/8-stream/HighAvailability/x86_64/os/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-centosofficial
gpgcheck=1
repo_gpgcheck=0
metadata_expire=6h
countme=1
enabled=1

[nfv]
name=CentOS Stream 8 - NFV
baseurl=https://mirrors.aliyun.com/centos/8-stream/NFV/x86_64/os/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-centosofficial
gpgcheck=1
repo_gpgcheck=0
metadata_expire=6h
countme=1
enabled=1

[rt]
name=CentOS Stream 8 - RT
baseurl=https://mirrors.aliyun.com/centos/8-stream/RT/x86_64/os/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-centosofficial
gpgcheck=1
repo_gpgcheck=0
metadata_expire=6h
countme=1
enabled=1

[resilientstorage]
name=CentOS Stream 8 - ResilientStorage
baseurl=https://mirrors.aliyun.com/centos/8-stream/ResilientStorage/x86_64/os/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-centosofficial
gpgcheck=1
repo_gpgcheck=0
metadata_expire=6h
countme=1
enabled=1

[extras-common]
name=CentOS Stream 8 - Extras packages
baseurl=https://mirrors.aliyun.com/centos/8-stream/extras/x86_64/extras-common/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-Extras-SHA512
gpgcheck=1
repo_gpgcheck=0
metadata_expire=6h
countme=1
enabled=1

[extras]
name=CentOS Stream $releasever - Extras
mirrorlist=http://mirrorlist.centos.org/?release=$stream&arch=$basearch&repo=extras&infra=$infra
#baseurl=http://mirror.centos.org/$contentdir/$stream/extras/$basearch/os/
baseurl=https://mirrors.aliyun.com/centos/8-stream/extras/x86_64/os/
gpgcheck=1
enabled=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-centosofficial

[centos-ceph-pacific]
name=CentOS - Ceph Pacific
baseurl=https://mirrors.aliyun.com/centos/8-stream/storage/x86_64/ceph-pacific/
gpgcheck=0
enabled=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-Storage

[centos-rabbitmq-38]
name=CentOS-8 - RabbitMQ 38
baseurl=https://mirrors.aliyun.com/centos/8-stream/messaging/x86_64/rabbitmq-38/
gpgcheck=1
enabled=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-Messaging

[centos-nfv-openvswitch]
name=CentOS Stream 8 - NFV OpenvSwitch
baseurl=https://mirrors.aliyun.com/centos/8-stream/nfv/x86_64/openvswitch-2/
gpgcheck=1
enabled=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-NFV
module_hotfixes=1

[baseos]
name=CentOS Stream 8 - BaseOS
baseurl=https://mirrors.aliyun.com/centos/8-stream/BaseOS/x86_64/os/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-centosofficial
gpgcheck=1
repo_gpgcheck=0
metadata_expire=6h
countme=1
enabled=1

[appstream]
name=CentOS Stream 8 - AppStream
baseurl=https://mirrors.aliyun.com/centos/8-stream/AppStream/x86_64/os/
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-centosofficial
gpgcheck=1
repo_gpgcheck=0
metadata_expire=6h
countme=1
enabled=1

[centos-openstack-victoria]
name=CentOS 8 - OpenStack victoria
baseurl=https://mirrors.aliyun.com/centos/8-stream/cloud/x86_64/openstack-victoria/
#baseurl=https://repo.huaweicloud.com/centos/8-stream/cloud/x86_64/openstack-yoga/
gpgcheck=1
enabled=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-SIG-Cloud
module_hotfixes=1

[powertools]
name=CentOS Stream 8 - PowerTools
#mirrorlist=http://mirrorlist.centos.org/?release=$stream&arch=$basearch&repo=PowerTools&infra=$infra
baseurl=https://mirrors.aliyun.com/centos/8-stream/PowerTools/x86_64/os/
gpgcheck=1
enabled=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-centosofficial
EOF

***配置epel扩展源***

yum install -y https://mirrors.aliyun.com/epel/epel-release-latest-8.noarch.rpm
sed -i 's|^#baseurl=https://download.example/pub|baseurl=https://mirrors.aliyun.com|' /etc/yum.repos.d/epel*
sed -i 's|^metalink|#metalink|' /etc/yum.repos.d/epel*

3.5 安装 ceph（所有节点）

安装时间受主机网络环境及性能影响，时间可能稍长，耐心等待

yum install -y ceph*

3.6 配置时间同步（所有节点）

[root@ceph1 ~]# vi /etc/chrony.conf 
[root@ceph1 ~]# cat /etc/chrony.conf 
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
# pool 2.centos.pool.ntp.org iburst
server ntp.aliyun.com iburst

[root@ceph1 ~]# scp /etc/chrony.conf ceph2:/etc/
chrony.conf                                                                                                                            100% 1116     1.4MB/s   00:00    
[root@ceph1 ~]# scp /etc/chrony.conf ceph3:/etc/
chrony.conf  

***启动服务***
systemctl restart chronyd
systemctl enable chronyd

4 配置 ceph 集群

minimal 最小化安装场景下，很多命令无法补全，可使用 yum install -y bash-completion 进行相关包安装，并通过重新登录或执行 bash 进行终端刷新

4.1 初始化集群（仅ceph1节点）

[root@ceph1 ~]# yum install -y bash-completion vim net-tools
[root@ceph1 ~]# bash
[root@ceph1 ~]# cephadm bootstrap --mon-ip 192.168.100.201
This is a development version of cephadm.
For information regarding the latest stable release:
    https://docs.ceph.com/docs/quincy/cephadm/install
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit chronyd.service is enabled and running
Repeating the final host check...
podman (/usr/bin/podman) version 4.3.1 is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
Cluster fsid: 146108c4-13c4-11ee-ac7a-000c2928fec5
Verifying IP 192.168.100.201 port 3300 ...
Verifying IP 192.168.100.201 port 6789 ...
Mon IP `192.168.100.201` is in CIDR network `192.168.100.0/24`
Mon IP `192.168.100.201` is in CIDR network `192.168.100.0/24`
Internal network (--cluster-network) has not been provided, OSD replication will default to the public_network
Pulling container image quay.ceph.io/ceph-ci/ceph:main...
Ceph version: ceph version 18.0.0-4601-g69cfd0e6 (69cfd0e6d2dec3152de87c9eeefc72cf542257de) reef (dev)
Extracting ceph user uid/gid from container image...
Creating initial keys...
Creating initial monmap...
Creating mon...
Waiting for mon to start...
Waiting for mon...
mon is available
Assimilating anything we can from ceph.conf...
Generating new minimal ceph.conf...
Restarting the monitor...
Setting mon public_network to 192.168.100.0/24
Wrote config to /etc/ceph/ceph.conf
Wrote keyring to /etc/ceph/ceph.client.admin.keyring
Creating mgr...
Verifying port 9283 ...
Verifying port 8765 ...
Verifying port 8443 ...
Waiting for mgr to start...
Waiting for mgr...
mgr not available, waiting (1/15)...
mgr not available, waiting (2/15)...
mgr not available, waiting (3/15)...
mgr is available
Enabling cephadm module...
Waiting for the mgr to restart...
Waiting for mgr epoch 5...
mgr epoch 5 is available
Setting orchestrator backend to cephadm...
Generating ssh key...
Wrote public SSH key to /etc/ceph/ceph.pub
Adding key to root@localhost authorized_keys...
Adding host ceph1...
Deploying mon service with default placement...
Deploying mgr service with default placement...
Deploying crash service with default placement...
Deploying ceph-exporter service with default placement...
Deploying prometheus service with default placement...
Deploying grafana service with default placement...
Deploying node-exporter service with default placement...
Deploying alertmanager service with default placement...
Enabling the dashboard module...
Waiting for the mgr to restart...
Waiting for mgr epoch 9...
mgr epoch 9 is available
Generating a dashboard self-signed certificate...
Creating initial admin user...
Fetching dashboard port number...
Ceph Dashboard is now available at:

	     URL: https://ceph1:8443/
	    User: admin
	Password: oguv8xxeiq

Enabling client.admin keyring and conf on hosts with "admin" label
Saving cluster configuration to /var/lib/ceph/146108c4-13c4-11ee-ac7a-000c2928fec5/config directory
Enabling autotune for osd_memory_target
You can access the Ceph CLI as following in case of multi-cluster or non-default config:

	sudo /usr/sbin/cephadm shell --fsid 146108c4-13c4-11ee-ac7a-000c2928fec5 -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring

Or, if you are only running a single cluster on this host:

	sudo /usr/sbin/cephadm shell 

Please consider enabling telemetry to help improve Ceph:

	ceph telemetry on

For more information see:

	https://docs.ceph.com/en/latest/mgr/telemetry/

Bootstrap complete.

4.2 集群增加节点

[root@ceph1 ~]# cd /etc/ceph/
[root@ceph1 ceph]# ls
ceph.client.admin.keyring  ceph.conf  ceph.pub  rbdmap
[root@ceph1 ceph]# ssh-copy-id -f -i ceph.pub ceph2
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "ceph.pub"

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'ceph2'"
and check to make sure that only the key(s) you wanted were added.

[root@ceph1 ceph]# ceph orch host add ceph2 --labels=mon
Added host 'ceph2' with addr '192.168.100.202'

[root@ceph1 ceph]# ssh-copy-id -f -i ceph.pub ceph3
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "ceph.pub"

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'ceph3'"
and check to make sure that only the key(s) you wanted were added.

[root@ceph1 ceph]# ceph orch host add ceph3 --labels=mon
Added host 'ceph3' with addr '192.168.100.203'

4.3 集群添加 OSD

[root@ceph1 ceph]# ceph osd tree
ID  CLASS  WEIGHT  TYPE NAME     STATUS  REWEIGHT  PRI-AFF
-1              0  root default                           
[root@ceph1 ceph]# ceph orch daemon add osd ceph1:/dev/nvme0n2
Created osd(s) 0 on host 'ceph1'
[root@ceph1 ceph]# ceph orch daemon add osd ceph2:/dev/nvme0n2
Created osd(s) 1 on host 'ceph2'
[root@ceph1 ceph]# ceph orch daemon add osd ceph3:/dev/nvme0n2
Created osd(s) 2 on host 'ceph3'

[root@ceph1 ceph]# ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME       STATUS  REWEIGHT  PRI-AFF
-1         0.08789  root default                             
-3         0.02930      host ceph1                           
 0    ssd  0.02930          osd.0       up   1.00000  1.00000
-5         0.02930      host ceph2                           
 1    ssd  0.02930          osd.1       up   1.00000  1.00000
-7         0.02930      host ceph3                           
 2    ssd  0.02930          osd.2       up   1.00000  1.00000

***查看集群状态，需要稍等待一下状态变为 OK***
[root@ceph1 ceph]# ceph -s
  cluster:
    id:     146108c4-13c4-11ee-ac7a-000c2928fec5
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph1,ceph2,ceph3 (age 4s)
    mgr: ceph1.yayvnq(active, since 24m), standbys: ceph2.bvqedx
    osd: 3 osds: 3 up (since 15m), 3 in (since 16m)
 
  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 449 KiB
    usage:   81 MiB used, 90 GiB / 90 GiB avail
    pgs:     1 active+clean

4.4 Dashboard 界面登录

添加主机节点，增加OSD等也可以通过界面操作。查看 4.1 小节 初始化完成后生成的 URL 及账号密码，进行登录

5 CEPH 集群启停

重启或停止前，请确保当前集群状态为 OK

5.1 集群关闭

设置 noout、norecover、norebalance、nobackfill、nodown 和 pause 标志

[root@ceph1 ~]# ceph health detail
HEALTH_OK
[root@ceph1 ~]# ceph osd set noout
noout is set
[root@ceph1 ~]# ceph osd set norecover
norecover is set
[root@ceph1 ~]# ceph osd set norebalance
norebalance is set
[root@ceph1 ~]# ceph osd set nobackfill
nobackfill is set
[root@ceph1 ~]# ceph osd set nodown
nodown is set
[root@ceph1 ~]# ceph osd set pause
pauserd,pausewr is set

逐一关闭 OSD 节点

[root@ceph1 ~]# systemctl stop ceph-osd.target
[root@ceph2 ~]# systemctl stop ceph-osd.target
[root@ceph3 ~]# systemctl stop ceph-osd.target

逐一关闭监控节点

[root@ceph1 ~]# systemctl stop ceph-mon.target
[root@ceph2 ~]# systemctl stop ceph-mon.target
[root@ceph3 ~]# systemctl stop ceph-mon.target

5.2 集群开启

逐一开启监控节点

[root@ceph1 ~]# systemctl start ceph-mon.target
[root@ceph2 ~]# systemctl start ceph-mon.target
[root@ceph3 ~]# systemctl start ceph-mon.target

逐一开启 OSD 节点

[root@ceph1 ~]# systemctl start ceph-osd.target
[root@ceph2 ~]# systemctl start ceph-osd.target
[root@ceph3 ~]# systemctl start ceph-osd.target

取消设置 noout、norecover、norebalance、nobackfill、nodown 和 pause 标志

[root@ceph1 ~]# ceph osd unset noout
noout is unset
[root@ceph1 ~]# ceph osd unset norecover
norecover is unset
[root@ceph1 ~]# ceph osd unset norebalance
norebalance is unset
[root@ceph1 ~]# ceph osd unset nobackfill
nobackfill is unset
[root@ceph1 ~]# ceph osd unset nodown
nodown is unset
[root@ceph1 ~]# ceph osd unset pause
pauserd,pausewr is unset

[root@ceph1 ~]# ceph health detail
HEALTH_OK

6 错误处理

6.1 初始化集群报错

网络原因可能会导致 podman 在 pull 镜像的时候无法拉取

[root@ceph1 ~]# cephadm bootstrap --mon-ip 192.168.100.201
This is a development version of cephadm.
For information regarding the latest stable release:
    https://docs.ceph.com/docs/quincy/cephadm/install
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit chronyd.service is enabled and running
Repeating the final host check...
podman (/usr/bin/podman) version 4.3.1 is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
Cluster fsid: 00c6c330-136a-11ee-adb8-000c2928fec5
Verifying IP 192.168.100.201 port 3300 ...
Verifying IP 192.168.100.201 port 6789 ...
Mon IP `192.168.100.201` is in CIDR network `192.168.100.0/24`
Mon IP `192.168.100.201` is in CIDR network `192.168.100.0/24`
Internal network (--cluster-network) has not been provided, OSD replication will default to the public_network
Pulling container image quay.ceph.io/ceph-ci/ceph:main...
Non-zero exit code 125 from /usr/bin/podman pull quay.ceph.io/ceph-ci/ceph:main
/usr/bin/podman: stderr Trying to pull quay.ceph.io/ceph-ci/ceph:main...
/usr/bin/podman: stderr Getting image source signatures
/usr/bin/podman: stderr Copying blob sha256:53ad993e782b1511381b3fe09b4326127ba51715120a886097d1debb2fdd93be
/usr/bin/podman: stderr Copying blob sha256:2190a7c58a810ef42ff112928214340ca1d477693415e4eeb7f8e46eb1ea0ced
/usr/bin/podman: stderr Error: writing blob: storing blob to file "/var/tmp/storage3791588301/1": happened during read: unexpected EOF
ERROR: Failed command: /usr/bin/podman pull quay.ceph.io/ceph-ci/ceph:main

***解决方法：可直接通过以下语句，将集群初始化所需镜像提前拉取到本地，之后再次初始化***

[root@ceph1 ~]# podman pull registry.cn-hangzhou.aliyuncs.com/cloudcs/ceph:main
[root@ceph1 ~]# podman pull registry.cn-hangzhou.aliyuncs.com/cloudcs/ceph-grafana:9.4.7
[root@ceph1 ~]# podman pull registry.cn-hangzhou.aliyuncs.com/cloudcs/prometheus:v2.43.0
[root@ceph1 ~]# podman pull registry.cn-hangzhou.aliyuncs.com/cloudcs/alertmanager:v0.25.0
[root@ceph1 ~]# podman pull registry.cn-hangzhou.aliyuncs.com/cloudcs/node-exporter:v1.5.0

[root@ceph1 ~]# podman tag registry.cn-hangzhou.aliyuncs.com/cloudcs/ceph:main quay.ceph.io/ceph-ci/ceph:main
[root@ceph1 ~]# podman tag registry.cn-hangzhou.aliyuncs.com/cloudcs/ceph-grafana:9.4.7 quay.io/ceph/ceph-grafana:9.4.7
[root@ceph1 ~]# podman tag registry.cn-hangzhou.aliyuncs.com/cloudcs/prometheus:v2.43.0 quay.io/prometheus/prometheus:v2.43.0
[root@ceph1 ~]# podman tag registry.cn-hangzhou.aliyuncs.com/cloudcs/alertmanager:v0.25.0 quay.io/prometheus/alertmanager:v0.25.0
[root@ceph1 ~]# podman tag registry.cn-hangzhou.aliyuncs.com/cloudcs/node-exporter:v1.5.0 quay.io/prometheus/node-exporter:v1.5.0

[root@ceph1 ~]# podman image ls
REPOSITORY                                               TAG         IMAGE ID      CREATED       SIZE
quay.ceph.io/ceph-ci/ceph                                main        04f443a559ce  10 hours ago  1.29 GB
registry.cn-hangzhou.aliyuncs.com/cloudcs/ceph           main        04f443a559ce  10 hours ago  1.29 GB
quay.io/ceph/ceph-grafana                                9.4.7       2c41d148cca3  2 months ago  647 MB
registry.cn-hangzhou.aliyuncs.com/cloudcs/ceph-grafana   9.4.7       2c41d148cca3  2 months ago  647 MB
quay.io/prometheus/prometheus                            v2.43.0     a07b618ecd1d  3 months ago  235 MB
registry.cn-hangzhou.aliyuncs.com/cloudcs/prometheus     v2.43.0     a07b618ecd1d  3 months ago  235 MB
quay.io/prometheus/alertmanager                          v0.25.0     c8568f914cd2  6 months ago  66.5 MB
registry.cn-hangzhou.aliyuncs.com/cloudcs/alertmanager   v0.25.0     c8568f914cd2  6 months ago  66.5 MB
quay.io/prometheus/node-exporter                         v1.5.0      0da6a335fe13  6 months ago  23.9 MB
registry.cn-hangzhou.aliyuncs.com/cloudcs/node-exporter  v1.5.0      0da6a335fe13  6 months ago  23.9 MB

[root@ceph1 ~]# cephadm bootstrap --mon-ip 192.168.100.201

6.2 集群状态

添加完 OSD 之后，正常情况下状态为 OK，如果状态为 HEALTH_WARN，则可以重启对应节点的 mon 服务来解决

[root@ceph1 ceph]# ceph -s
  cluster:
    id:     146108c4-13c4-11ee-ac7a-000c2928fec5
    health: HEALTH_WARN
            1 slow ops, oldest one blocked for 904 sec, mon.ceph2 has slow ops
 
  services:
    mon: 3 daemons, quorum ceph1,ceph2,ceph3 (age 15m)
    mgr: ceph1.yayvnq(active, since 23m), standbys: ceph2.bvqedx
    osd: 3 osds: 3 up (since 15m), 3 in (since 15m)
 
  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 449 KiB
    usage:   81 MiB used, 90 GiB / 90 GiB avail
    pgs:     1 active+clean

[root@ceph2 ~]# systemctl restart ceph-146108c4-13c4-11ee-ac7a-000c2928fec5@mon.ceph2.service
[root@ceph2 ~]# systemctl restart ceph-mon.target

[root@ceph1 ~]# ceph -s
  cluster:
    id:     146108c4-13c4-11ee-ac7a-000c2928fec5
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum ceph1,ceph2,ceph3 (age 11m)
    mgr: ceph1.yayvnq(active, since 10m), standbys: ceph2.bvqedx
    osd: 3 osds: 3 up (since 11m), 3 in (since 45m)
 
  data:
    pools:   1 pools, 1 pgs
    objects: 2 objects, 449 KiB
    usage:   82 MiB used, 90 GiB / 90 GiB avail
    pgs:     1 active+clean