文档课题:rac环境模拟vote盘和data盘磁盘头损坏的修复.
系统:centos 7.9 64位
数据库:oracle 11.2.0.4 64位
环境:rac (两节点)
1、磁盘组信息
1.1、系统信息
[root@hisdb1 ~]# cat /etc/*release
CentOS Linux release 7.9.2009 (Core)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
CentOS Linux release 7.9.2009 (Core)
CentOS Linux release 7.9.2009 (Core)
1.2、磁盘信息
SQL> select group_number,name,path,state,total_mb,free_mb from v$asm_disk where name is not null order by path;
GROUP_NUMBER NAME   PATH     STATE      TOTAL_MB    FREE_MB
------------ --------------- -------------------- -------- ---------- ----------
           2 DATA02   ORCL:DATA02          NORMAL        10239       6662
           1 DATA03   ORCL:DATA03          NORMAL        20479      13765
           3 DATA04   ORCL:DATA04          NORMAL        10239       9843
SQL> select group_number,name,type,total_mb,free_mb from v$asm_diskgroup;
GROUP_NUMBER NAME      TYPE     TOTAL_MB    FREE_MB
------------ --------------- ------ ---------- ----------
           1 DATA            EXTERN      20479      13765
           2 FRA             EXTERN      10239       6662
           3 OCRBK          EXTERN      10239       9843
[root@hisdb1 disks]# pwd
/dev/oracleasm/disks
[root@hisdb1 disks]# ll /dev/oracleasm/disks/*
brw-rw---- 1 grid asmadmin 8, 17 Dec 27 20:27 /dev/oracleasm/disks/DATA01
brw-rw---- 1 grid asmadmin 8, 33 Dec 27 20:27 /dev/oracleasm/disks/DATA02
brw-rw---- 1 grid asmadmin 8, 49 Dec 27 20:27 /dev/oracleasm/disks/DATA03
brw-rw---- 1 grid asmadmin 8, 65 Dec 27 20:27 /dev/oracleasm/disks/DATA04
说明:以上DATA04对应vote盘,DATA03对应data盘.
2、vote盘
模拟vote盘的损坏以及修复.
2.1、拷贝数据
--从/dev/oracleasm/disks/DATA04拷贝1个8k的块到/home/grid/data04.dd
[grid@hisdb1 disks]$ dd if=/dev/oracleasm/disks/DATA04 of=/home/grid/data04.dd bs=8192 count=1
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000340858 s, 24.0 MB/s
[grid@hisdb1 ~]$ ll data04.dd 
-rw-r--r-- 1 grid oinstall 8192 Dec 27 21:32 data04.dd
--借助kfed读取/dev/oracleasm/disks/DATA04磁盘头信息.
[grid@hisdb1 ~]$ kfed read /dev/oracleasm/disks/DATA04 text=data04.txt
[grid@hisdb1 ~]$ head data04.txt
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  3855329304 ; 0x00c: 0xe5cba818
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
2.2、损坏磁盘
--破坏votedisk磁盘组的磁盘
[grid@hisdb1 ~]$ dd if=/dev/zero of=/dev/oracleasm/disks/DATA04 bs=8192 count=1
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000128522 s, 63.7 MB/s
[grid@hisdb1 ~]$ kfed read /dev/oracleasm/disks/DATA04 | head
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
2.3、异常重现
--重启集群
[root@hisdb1 ~]# /u01/app/11.2.0/grid/bin/crsctl stop cluster -all
CRS-2673: Attempting to stop 'ora.crsd' on 'hisdb1'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'hisdb1'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'hisdb1'
CRS-2673: Attempting to stop 'ora.OCRBK.dg' on 'hisdb1'
CRS-2673: Attempting to stop 'ora.FRA.dg' on 'hisdb1'
CRS-2673: Attempting to stop 'ora.heal.db' on 'hisdb1'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.hisdb1.vip' on 'hisdb1'
CRS-2677: Stop of 'ora.FRA.dg' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.heal.db' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.DATA.dg' on 'hisdb1'
CRS-2677: Stop of 'ora.hisdb1.vip' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.DATA.dg' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.crsd' on 'hisdb2'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'hisdb2'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'hisdb2'
CRS-2673: Attempting to stop 'ora.cvu' on 'hisdb2'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'hisdb2'
CRS-2673: Attempting to stop 'ora.OCRBK.dg' on 'hisdb2'
CRS-2673: Attempting to stop 'ora.FRA.dg' on 'hisdb2'
CRS-2673: Attempting to stop 'ora.heal.db' on 'hisdb2'
CRS-2673: Attempting to stop 'ora.oc4j' on 'hisdb2'
CRS-2677: Stop of 'ora.cvu' on 'hisdb2' succeeded
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.hisdb2.vip' on 'hisdb2'
CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.scan1.vip' on 'hisdb2'
CRS-2677: Stop of 'ora.FRA.dg' on 'hisdb2' succeeded
CRS-2677: Stop of 'ora.heal.db' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.DATA.dg' on 'hisdb2'
CRS-2677: Stop of 'ora.hisdb2.vip' on 'hisdb2' succeeded
CRS-2677: Stop of 'ora.DATA.dg' on 'hisdb2' succeeded
CRS-2677: Stop of 'ora.scan1.vip' on 'hisdb2' succeeded
CRS-2677: Stop of 'ora.OCRBK.dg' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'hisdb1'
CRS-2677: Stop of 'ora.asm' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.ons' on 'hisdb1'
CRS-2677: Stop of 'ora.oc4j' on 'hisdb2' succeeded
CRS-2677: Stop of 'ora.ons' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.net1.network' on 'hisdb1'
CRS-2677: Stop of 'ora.net1.network' on 'hisdb1' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'hisdb1' has completed
CRS-2677: Stop of 'ora.crsd' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'hisdb1'
CRS-2673: Attempting to stop 'ora.evmd' on 'hisdb1'
CRS-2673: Attempting to stop 'ora.asm' on 'hisdb1'
CRS-2677: Stop of 'ora.evmd' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.OCRBK.dg' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'hisdb2'
CRS-2677: Stop of 'ora.asm' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.ons' on 'hisdb2'
CRS-2677: Stop of 'ora.ons' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.net1.network' on 'hisdb2'
CRS-2677: Stop of 'ora.net1.network' on 'hisdb2' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'hisdb2' has completed
CRS-2677: Stop of 'ora.crsd' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'hisdb2'
CRS-2673: Attempting to stop 'ora.evmd' on 'hisdb2'
CRS-2673: Attempting to stop 'ora.asm' on 'hisdb2'
CRS-2677: Stop of 'ora.evmd' on 'hisdb2' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.asm' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'hisdb1'
CRS-2677: Stop of 'ora.ctssd' on 'hisdb2' succeeded
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'hisdb1'
CRS-2677: Stop of 'ora.cssd' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.asm' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'hisdb2'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'hisdb2'
CRS-2677: Stop of 'ora.cssd' on 'hisdb2' succeeded
[root@hisdb1 ~]# /u01/app/11.2.0/grid/bin/crsctl start cluster -all
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'hisdb2'
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'hisdb1'
CRS-2676: Start of 'ora.cssdmonitor' on 'hisdb2' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'hisdb2'
CRS-2676: Start of 'ora.cssdmonitor' on 'hisdb1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'hisdb1'
CRS-2672: Attempting to start 'ora.diskmon' on 'hisdb2'
CRS-2676: Start of 'ora.diskmon' on 'hisdb2' succeeded
CRS-2672: Attempting to start 'ora.diskmon' on 'hisdb1'
CRS-2676: Start of 'ora.diskmon' on 'hisdb1' succeeded
……
说明:此时会一直hang住,因为损坏的是投票盘,集群无法启动.
2.4、相关告警
--ocssd.log不断报如下错误:
2022-12-27 22:17:15.937: [    CSSD][3821278976]clssnmvDiskVerify: Successful discovery of 0 disks
2022-12-27 22:17:15.937: [    CSSD][3821278976]clssnmCompleteInitVFDiscovery: Completing initial voting file discovery
2022-12-27 22:17:15.937: [    CSSD][3821278976]clssnmvFindInitialConfigs: No voting files found
2022-12-27 22:17:15.937: [    CSSD][3821278976](:CSSNM00070:)clssnmCompleteInitVFDiscovery: Voting file not found. Retrying discovery in 15 seconds
2022-12-27 22:17:15.996: [    CSSD][3823675136]clssscSelect: cookie accept request 0x7fa0d80845c0
2022-12-27 22:17:15.996: [    CSSD][3823675136]clssscevtypSHRCON: getting client with cmproc 0x7fa0d80845c0
2022-12-27 22:17:15.996: [    CSSD][3823675136]clssgmRegisterClient: proc(4/0x7fa0d80845c0), client(358/0x7fa0d8071230)
2022-12-27 22:17:15.996: [    CSSD][3823675136]clssgmExecuteClientRequest(): type(6) size(684) only connect and exit messages are allowed before lease acquisition proc(0x7fa0d80845c0) client(0x7fa0d8071230)
2022-12-27 22:17:15.996: [    CSSD][3823675136]clssgmDiscEndpcl: gipcDestroy 0x5976
2022-12-27 22:17:16.329: [    CSSD][3823675136]clssscSelect: cookie accept request 0x7fa0d8099e80
2022-12-27 22:17:16.329: [    CSSD][3823675136]clssscevtypSHRCON: getting client with cmproc 0x7fa0d8099e80
2022-12-27 22:17:16.329: [    CSSD][3823675136]clssgmRegisterClient: proc(5/0x7fa0d8099e80), client(357/0x7fa0d8071230)
2022-12-27 22:17:16.329: [    CSSD][3823675136]clssgmExecuteClientRequest(): type(6) size(684) only connect and exit messages are allowed before lease acquisition proc(0x7fa0d8099e80) client(0x7fa0d8071230)
2022-12-27 22:17:16.329: [    CSSD][3823675136]clssgmDiscEndpcl: gipcDestroy 0x598c
2022-12-27 22:17:16.998: [    CSSD][3823675136]clssscSelect: cookie accept request 0x7fa0d80845c0
2022-12-27 22:17:16.998: [    CSSD][3823675136]clssscevtypSHRCON: getting client with cmproc 0x7fa0d80845c0
2022-12-27 22:17:16.998: [    CSSD][3823675136]clssgmRegisterClient: proc(4/0x7fa0d80845c0), client(359/0x7fa0d8071230)
2022-12-27 22:17:16.998: [    CSSD][3823675136]clssgmExecuteClientRequest(): type(6) size(684) only connect and exit messages are allowed before lease acquisition proc(0x7fa0d80845c0) client(0x7fa0d8071230)
-- alerthisdb1.log报错如下
[grid@hisdb1 hisdb1]$ tail -5000f alerthisdb1.log
每隔15s如下错误
[cssd(7816)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/hisdb1/cssd/ocssd.log
2.5、恢复vote磁盘
[grid@hisdb1 ~]$ kfed repair /dev/oracleasm/disks/DATA04
说明:修复成功后,集群恢复正常.
3、data盘
模拟data盘的损坏和修复.
3.1、拷贝数据
[grid@hisdb1 ~]$ dd if=/dev/oracleasm/disks/DATA03 of=/home/grid/data03.dd bs=8192 count=1  
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000373797 s, 21.9 MB/s
[grid@hisdb1 ~]$ kfed read /dev/oracleasm/disks/DATA03 | head
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:              2147483648 ; 0x008: disk=0
kfbh.check:                  3875939376 ; 0x00c: 0xe7062430
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
3.2、损坏磁盘
[grid@hisdb1 ~]$ dd if=/dev/zero of=/dev/oracleasm/disks/DATA03 bs=8192 count=1
1+0 records in
1+0 records out
8192 bytes (8.2 kB) copied, 0.000199175 s, 41.1 MB/s
[grid@hisdb1 ~]$ kfed read /dev/oracleasm/disks/DATA03 | head
kfbh.endian:                          0 ; 0x000: 0x00
kfbh.hard:                            0 ; 0x001: 0x00
kfbh.type:                            0 ; 0x002: KFBTYP_INVALID
kfbh.datfmt:                          0 ; 0x003: 0x00
kfbh.block.blk:                       0 ; 0x004: blk=0
kfbh.block.obj:                       0 ; 0x008: file=0
kfbh.check:                           0 ; 0x00c: 0x00000000
kfbh.fcn.base:                        0 ; 0x010: 0x00000000
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000
3.3、异常重现
[root@hisdb1 ~]# /u01/app/11.2.0/grid/bin/crsctl stop cluster -all
CRS-2673: Attempting to stop 'ora.crsd' on 'hisdb1'
CRS-2673: Attempting to stop 'ora.crsd' on 'hisdb2'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'hisdb1'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'hisdb1'
CRS-2673: Attempting to stop 'ora.LISTENER_SCAN1.lsnr' on 'hisdb1'
CRS-2673: Attempting to stop 'ora.cvu' on 'hisdb1'
CRS-2673: Attempting to stop 'ora.oc4j' on 'hisdb1'
CRS-2673: Attempting to stop 'ora.OCRBK.dg' on 'hisdb1'
CRS-2673: Attempting to stop 'ora.FRA.dg' on 'hisdb1'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'hisdb2'
CRS-2673: Attempting to stop 'ora.heal.db' on 'hisdb1'
CRS-2673: Attempting to stop 'ora.OCRBK.dg' on 'hisdb2'
CRS-2673: Attempting to stop 'ora.FRA.dg' on 'hisdb2'
CRS-2673: Attempting to stop 'ora.heal.db' on 'hisdb2'
CRS-2673: Attempting to stop 'ora.LISTENER.lsnr' on 'hisdb2'
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.hisdb2.vip' on 'hisdb2'
CRS-2677: Stop of 'ora.LISTENER_SCAN1.lsnr' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.scan1.vip' on 'hisdb1'
CRS-2677: Stop of 'ora.cvu' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.LISTENER.lsnr' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.hisdb1.vip' on 'hisdb1'
CRS-2677: Stop of 'ora.heal.db' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.DATA.dg' on 'hisdb2'
CRS-2677: Stop of 'ora.heal.db' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.DATA.dg' on 'hisdb1'
CRS-2677: Stop of 'ora.hisdb2.vip' on 'hisdb2' succeeded
CRS-2677: Stop of 'ora.scan1.vip' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.FRA.dg' on 'hisdb2' succeeded
CRS-2677: Stop of 'ora.FRA.dg' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.DATA.dg' on 'hisdb2' succeeded
CRS-2677: Stop of 'ora.DATA.dg' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.hisdb1.vip' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.oc4j' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.OCRBK.dg' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'hisdb2'
CRS-2677: Stop of 'ora.OCRBK.dg' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.asm' on 'hisdb1'
CRS-2677: Stop of 'ora.asm' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.ons' on 'hisdb2'
CRS-2677: Stop of 'ora.asm' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.ons' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.net1.network' on 'hisdb2'
CRS-2677: Stop of 'ora.net1.network' on 'hisdb2' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'hisdb2' has completed
CRS-2673: Attempting to stop 'ora.ons' on 'hisdb1'
CRS-2677: Stop of 'ora.ons' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.net1.network' on 'hisdb1'
CRS-2677: Stop of 'ora.net1.network' on 'hisdb1' succeeded
CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'hisdb1' has completed
CRS-2677: Stop of 'ora.crsd' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'hisdb2'
CRS-2673: Attempting to stop 'ora.evmd' on 'hisdb2'
CRS-2673: Attempting to stop 'ora.asm' on 'hisdb2'
CRS-2677: Stop of 'ora.crsd' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.ctssd' on 'hisdb1'
CRS-2673: Attempting to stop 'ora.evmd' on 'hisdb1'
CRS-2673: Attempting to stop 'ora.asm' on 'hisdb1'
CRS-2677: Stop of 'ora.evmd' on 'hisdb2' succeeded
CRS-2677: Stop of 'ora.evmd' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.ctssd' on 'hisdb2' succeeded
CRS-2677: Stop of 'ora.asm' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'hisdb2'
CRS-2677: Stop of 'ora.asm' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'hisdb1'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'hisdb1' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'hisdb1'
CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'hisdb2' succeeded
CRS-2673: Attempting to stop 'ora.cssd' on 'hisdb2'
CRS-2677: Stop of 'ora.cssd' on 'hisdb1' succeeded
CRS-2677: Stop of 'ora.cssd' on 'hisdb2' succeeded
[root@hisdb1 ~]# /u01/app/11.2.0/grid/bin/crsctl start cluster -all  
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'hisdb1'
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'hisdb2'
CRS-2676: Start of 'ora.cssdmonitor' on 'hisdb1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'hisdb1'
CRS-2676: Start of 'ora.cssdmonitor' on 'hisdb2' succeeded
CRS-2672: Attempting to start 'ora.diskmon' on 'hisdb1'
CRS-2672: Attempting to start 'ora.cssd' on 'hisdb2'
CRS-2672: Attempting to start 'ora.diskmon' on 'hisdb2'
CRS-2676: Start of 'ora.diskmon' on 'hisdb1' succeeded
CRS-2676: Start of 'ora.diskmon' on 'hisdb2' succeeded
CRS-2676: Start of 'ora.cssd' on 'hisdb1' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'hisdb1'
CRS-2676: Start of 'ora.cssd' on 'hisdb2' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'hisdb1'
CRS-2672: Attempting to start 'ora.ctssd' on 'hisdb2'
CRS-2676: Start of 'ora.ctssd' on 'hisdb2' succeeded
CRS-2672: Attempting to start 'ora.evmd' on 'hisdb2'
CRS-2676: Start of 'ora.ctssd' on 'hisdb1' succeeded
CRS-2672: Attempting to start 'ora.cluster_interconnect.haip' on 'hisdb2'
CRS-2672: Attempting to start 'ora.evmd' on 'hisdb1'
CRS-2676: Start of 'ora.evmd' on 'hisdb2' succeeded
CRS-2676: Start of 'ora.evmd' on 'hisdb1' succeeded
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'hisdb1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'hisdb1'
CRS-2676: Start of 'ora.cluster_interconnect.haip' on 'hisdb2' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'hisdb2'
CRS-2676: Start of 'ora.asm' on 'hisdb1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'hisdb1'
CRS-2676: Start of 'ora.asm' on 'hisdb2' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'hisdb2'
CRS-2676: Start of 'ora.crsd' on 'hisdb1' succeeded
CRS-2676: Start of 'ora.crsd' on 'hisdb2' succeeded
说明:集群能成功开启,但无法打开实例,因为实例的相关数据文件全在data磁盘组.
3.4、相关异常
[grid@hisdb1 ~]$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.4.0 Production on Tue Dec 27 22:46:21 2022
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> col name for a20
SQL> col path for a40
SQL> set line 160
SQL> select name,total_mb,usable_file_mb,state from v$asm_diskgroup;
NAME                   TOTAL_MB USABLE_FILE_MB STATE
-------------------- ---------- -------------- -----------
FRA                       10239           6624 MOUNTED
OCRBK                     10239           9843 MOUNTED
SQL> alter diskgroup data mount;
alter diskgroup data mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15017: diskgroup "DATA" cannot be mounted
ORA-15063: ASM discovered an insufficient number of disks for diskgroup "DATA"
说明:可以看到data磁盘无法挂载.
[grid@hisdb1 hisdb1]$ tail -5000f alerthisdb1.log
2022-12-27 22:46:18.033: 
[crsd(10020)]CRS-2807:Resource 'ora.DATA.dg' failed to start automatically.
2022-12-27 22:46:18.033: 
[crsd(10020)]CRS-2807:Resource 'ora.DATA.dg' failed to start automatically.
2022-12-27 22:46:18.033: 
[crsd(10020)]CRS-2807:Resource 'ora.heal.db' failed to start automatically.
2022-12-27 22:46:18.033: 
[crsd(10020)]CRS-2807:Resource 'ora.heal.db' failed to start automatically.
说明:集群告警日志如上.
SQL> select group_number,name,path,state,total_mb,free_mb from v$asm_disk;
GROUP_NUMBER NAME   PATH  STATE      TOTAL_MB    FREE_MB
------------ -------------------- --------------- -------- ---------- ----------
           0            ORCL:DATA01     NORMAL            0          0
           0            ORCL:DATA03     NORMAL            0          0
           2 DATA02     ORCL:DATA02     NORMAL        10239       6624
           3 DATA04     ORCL:DATA04     NORMAL        10239       9843
[grid@hisdb2 hisdb2]$ crsctl stat res -t
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
               ONLINE  OFFLINE      hisdb1                                       
               ONLINE  OFFLINE      hisdb2                                       
ora.FRA.dg
               ONLINE  ONLINE       hisdb1                                       
               ONLINE  ONLINE       hisdb2                                       
ora.LISTENER.lsnr
               ONLINE  ONLINE       hisdb1                                       
               ONLINE  ONLINE       hisdb2                                       
ora.OCRBK.dg
               ONLINE  ONLINE       hisdb1                                       
               ONLINE  ONLINE       hisdb2                                       
ora.asm
               ONLINE  ONLINE       hisdb1                   Started             
               ONLINE  ONLINE       hisdb2                   Started             
ora.gsd
               OFFLINE OFFLINE      hisdb1                                       
               OFFLINE OFFLINE      hisdb2                                       
ora.net1.network
               ONLINE  ONLINE       hisdb1                                       
               ONLINE  ONLINE       hisdb2                                       
ora.ons
               ONLINE  ONLINE       hisdb1                                       
               ONLINE  ONLINE       hisdb2                                       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       hisdb2                                       
ora.cvu
      1        ONLINE  ONLINE       hisdb1                                       
ora.heal.db
      1        ONLINE  OFFLINE                               Instance Shutdown   
      2        ONLINE  OFFLINE                               Instance Shutdown   
ora.hisdb1.vip
      1        ONLINE  ONLINE       hisdb1                                       
ora.hisdb2.vip
      1        ONLINE  ONLINE       hisdb2                                       
ora.oc4j
      1        ONLINE  ONLINE       hisdb1                                       
ora.orcl.db
      1        OFFLINE OFFLINE                               Instance Shutdown   
      2        OFFLINE OFFLINE                               Instance Shutdown   
ora.scan1.vip
      1        ONLINE  ONLINE       hisdb2        
说明:集群状态显示异常,heal数据库无法开启.
3.5、恢复data磁盘
[grid@hisdb1 ~]$ kfed repair /dev/oracleasm/disks/DATA03
[grid@hisdb1 ~]$ sqlplus / as sysasm
SQL*Plus: Release 11.2.0.4.0 Production on Tue Dec 27 22:54:47 2022
Copyright (c) 1982, 2013, Oracle.  All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> alter diskgroup data mount;
Diskgroup altered.
SQL> select group_number,name,path,state,total_mb,free_mb from v$asm_disk;
GROUP_NUMBER NAME   PATH           STATE      TOTAL_MB    FREE_MB
------------ --------------- ------------------------- -------- ---------- ----------
           0          ORCL:DATA01        NORMAL            0          0
           2 DATA02  ORCL:DATA02        NORMAL        10239       6618
           1 DATA03  ORCL:DATA03        NORMAL        20479      13765
           3 DATA04  ORCL:DATA04        NORMAL        10239       9843
说明:data磁盘修复成功后,集群恢复正常.
参考文档:
https://www.modb.pro/db/22060
https://blog.csdn.net/jycjyc/article/details/106275991
https://blog.51cto.com/lhrbest/2699983