目录
1、模拟主库(node1) DOWN机(环境 node1 为主库,node2 为备库)
2、模拟备库(node1) DOWN机(环境 node2 为主库,node1 为备库)
3、模拟主库node2数据库被kill(环境 node2 为主库,node1 为备库)
4、模拟备库node2数据库被kill(环境 node1 为主库,node2 为备库)
1、模拟主库(node1) DOWN机(环境 node1 为主库,node2 为备库)
对主库所在机器shutdown
从监视器处可以查看到RW1的状态由OPEN->ERROR ,RW2由open状态->STARTUP->MON CONFIRM->FAILOVER->OPEN
监视器log:
[monitor] 2022-03-25 05:46:22: Received message from(DMRW1)
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-03-25 05:46:22 OPEN OK DMRW1 OPEN STANDBY NULL 7 32545 32545
[monitor] 2022-03-25 05:46:23: Received message from(DMRW2)
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-03-25 05:46:23 OPEN OK DMRW2 OPEN PRIMARY VALID 7 32545 32545
[monitor] 2022-03-25 05:50:03: Received message timeout from(DMRW1)
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-03-25 05:49:52 ERROR OK DMRW1 OPEN STANDBY VALID 7 32545 32545
[monitor] 2022-03-25 05:52:32: Dmwatcher process DMRW2 status switching [OPEN-->STARTUP]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-03-25 05:52:32 STARTUP OK DMRW2 SUSPEND PRIMARY VALID 7 32545 32548
[monitor] 2022-03-25 05:52:33: Dmwatcher process DMRW2 status switching [STARTUP-->MON CONFIRM]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-03-25 05:52:33 MON CONFIRM OK DMRW2 SUSPEND PRIMARY VALID 7 32545 32548
[monitor] 2022-03-25 05:52:34: Dmwatcher process DMRW2 status switching [MON CONFIRM-->FAILOVER]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-03-25 05:52:34 FAILOVER OK DMRW2 SUSPEND PRIMARY VALID 7 32545 32548
[monitor] 2022-03-25 05:52:37: Dmwatcher process DMRW2 status switching [FAILOVER-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-03-25 05:52:37 OPEN OK DMRW2 OPEN PRIMARY VALID 7 32548 32548
node2上可正常提供服务
[dmdba@node2 ~]$ disql SYSDBA/SYSDBA
SQL> select * from sysdba.tab ;
no rows
used time: 18.963(ms). Execute id is 400.
SQL> insert into sysdba.tab values(10) ;
affect rows 1
used time: 0.303(ms). Execute id is 401.
SQL> commit ;
executed successfully
used time: 3.115(ms). Execute id is 402.
启动node1 所在的机器
node1 状态由NONE-->STARTUP-->OPEN
node2 状态OPEN-->RECOVERY-->OPEN
监视器log
[monitor] 2022-03-25 06:08:23: Dmwatcher process DMRW1 status switching [NONE-->STARTUP]
[monitor] 2022-03-25 06:08:24: Dmwatcher process DMRW1 status switching [STARTUP-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-03-25 06:08:23 OPEN OK DMRW1 OPEN STANDBY INVALID 7 32545 32545
[monitor] 2022-03-25 06:08:24: Dmwatcher process DMRW2 status switching [OPEN-->RECOVERY]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-03-25 06:08:24 RECOVERY OK DMRW2 OPEN PRIMARY VALID 7 32555 32555
[monitor] 2022-03-25 06:08:26: Dmwatcher process DMRW2 status switching [RECOVERY-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-03-25 06:08:25 OPEN OK DMRW2 OPEN PRIMARY VALID 7 32555 32555
show
2022-03-25 06:14:45
#================================================================================#
GROUP OGUID MON_CONFIRM MODE MPP_FLAG
GRP_RW 453331 TRUE AUTO FALSE
<<DATABASE GLOBAL INFO:>>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT
192.168.1.22 52141 2022-03-25 06:14:44 GLOBAL VALID OPEN DMRW2 OK 1 1 OPEN PRIMARY DSC_OPEN REALTIME VALID
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
192.168.1.22 5236 OK DMRW2 OPEN PRIMARY 0 0 REALTIME VALID 4441 32555 4441 32555 NONE
<<DATABASE GLOBAL INFO:>>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT
192.168.1.11 52141 2022-03-25 06:14:43 GLOBAL VALID OPEN DMRW1 OK 1 1 OPEN STANDBY DSC_OPEN REALTIME VALID
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
192.168.1.11 5236 OK DMRW1 OPEN STANDBY 0 0 REALTIME VALID 4381 32555 4381 32555 NONE
DATABASE(DMRW1) APPLY INFO FROM (DMRW2), REDOS_PARALLEL_NUM (1):
DSC_SEQNO[0], (RSEQ, SSEQ, KSEQ)[4441, 4441, 4441], (RLSN, SLSN, KLSN)[32555, 32555, 32555], N_TSK[0], TSK_MEM_USE[0]
REDO_LSN_ARR: (32555)
node1上对数据进行查看
[dmdba@node1 ~]$ disql SYSDBA/SYSDBA
Server[LOCALHOST:5236]:mode is standby, state is open
login used time : 2.486(ms)
disql V8
SQL> select * from sysdba.tab ;
LINEID ID
---------- -----------
1 10
used time: 3.928(ms). Execute id is 1.
SQL> insert into sysdba.tab values(20) ;
insert into sysdba.tab values(20) ;
[-2018]:Error in line: 1
Try to insert/update/delete table table is not temporary or contains lob on standby mode.
used time: 3.638(ms). Execute id is 0.
node2上对数据进行查看
SQL> select * from sysdba.tab ;
LINEID ID
---------- -----------
1 10
used time: 0.312(ms). Execute id is 403.
SQL> insert into sysdba.tab values(20) ;
affect rows 1
used time: 0.299(ms). Execute id is 404.
SQL> select * from sysdba.tab ;
LINEID ID
---------- -----------
1 10
2 20
used time: 0.637(ms). Execute id is 405.
2、模拟备库(node1) DOWN机(环境 node2 为主库,node1 为备库)
对备库所在机器shutdown (node1 )
监控log:
[monitor] 2022-03-25 07:02:14: Received message timeout from(DMRW1)
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-03-25 07:02:02 ERROR OK DMRW1 OPEN STANDBY VALID 8 34001 34001
[monitor] 2022-03-25 07:05:58: Dmwatcher process DMRW2 status switching [OPEN-->STARTUP]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-03-25 07:05:58 STARTUP OK DMRW2 SUSPEND PRIMARY VALID 8 34001 34001
[monitor] 2022-03-25 07:05:58: Dmwatcher process DMRW2 status switching [STARTUP-->MON CONFIRM]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-03-25 07:05:58 MON CONFIRM OK DMRW2 SUSPEND PRIMARY VALID 8 34001 34001
[monitor] 2022-03-25 07:05:59: Dmwatcher process DMRW2 status switching [MON CONFIRM-->FAILOVER]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-03-25 07:05:58 FAILOVER OK DMRW2 SUSPEND PRIMARY VALID 8 34001 34001
[monitor] 2022-03-25 07:06:02: Dmwatcher process DMRW2 status switching [FAILOVER-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-03-25 07:06:02 OPEN OK DMRW2 OPEN PRIMARY VALID 8 34001 34001
show
2022-03-25 07:08:40
#================================================================================#
GROUP OGUID MON_CONFIRM MODE MPP_FLAG
GRP_RW 453331 TRUE AUTO FALSE
<<DATABASE GLOBAL INFO:>>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT
192.168.1.22 52141 2022-03-25 07:08:40 GLOBAL VALID OPEN DMRW2 OK 1 1 OPEN PRIMARY DSC_OPEN REALTIME VALID
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
192.168.1.22 5236 OK DMRW2 OPEN PRIMARY 0 0 REALTIME VALID 4465 34001 4465 34001 NONE
ERROR DATABASE:
<<DATABASE GLOBAL INFO:>>
DW_IP MAL_DW_PORT WTIME WTYPE WCTLSTAT WSTATUS INAME INST_OK N_EP N_OK ISTATUS IMODE DSC_STATUS RTYPE RSTAT
192.168.1.11 52141 2022-03-25 07:02:02 GLOBAL VALID ERROR DMRW1 OK 1 1 OPEN STANDBY DSC_OPEN REALTIME INVALID
EP INFO:
INST_IP INST_PORT INST_OK INAME ISTATUS IMODE DSC_SEQNO DSC_CTL_NODE RTYPE RSTAT FSEQ FLSN CSEQ CLSN DW_STAT_FLAG
192.168.1.11 5236 OK DMRW1 OPEN STANDBY 0 0 REALTIME INVALID 4444 34001 4444 34001 NONE
DATABASE(DMRW1) APPLY INFO FROM (DMRW2), REDOS_PARALLEL_NUM (1):
DSC_SEQNO[0], (RSEQ, SSEQ, KSEQ)[4464, 4464, 4464], (RLSN, SLSN, KLSN)[34001, 34001, 34001], N_TSK[0], TSK_MEM_USE[0]
REDO_LSN_ARR: (34001)
node2上继续提供服务没有问题
SQL> insert into tab1 values(2) ;
affect rows 1
used time: 0.257(ms). Execute id is 7.
SQL> commit ;
executed successfully
used time: 1.475(ms). Execute id is 8.
启动node1 机器
note1上查看数据
[dmdba@node1 ~]$ disql SYSDBA/SYSDBA
Server[LOCALHOST:5236]:mode is standby, state is open
login used time : 2.145(ms)
disql V8
SQL> select * from tab1 ;
LINEID ID
---------- -----------
1 1
2 2
used time: 16.865(ms). Execute id is 0.
3、模拟主库node2数据库被kill(环境 node2 为主库,node1 为备库)
监控log:
[monitor] 2022-03-25 07:30:40: Instance DMRW2[PRIMARY, OPEN, ISTAT_SAME:TRUE] error
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-03-25 07:30:40 OPEN ERROR DMRW2 OPEN PRIMARY VALID 8 34006 34006
[monitor] 2022-03-25 07:30:40: Check primary instance error in group(GRP_RW), start to auto takeover
[monitor] 2022-03-25 07:30:40: Notify group(GRP_RW)'s active dmwatcher to set MID
[monitor] 2022-03-25 07:30:40: Dmwatcher process DMRW2 status switching [OPEN-->STARTUP]
[monitor] 2022-03-25 07:30:40: Notify group(GRP_RW)'s active dmwatcher to set MID success
[monitor] 2022-03-25 07:30:40: Start to takeover use instance DMRW1
[monitor] 2022-03-25 07:30:40: Notify dmwatcher(DMRW1) switch to TAKEOVER status
[monitor] 2022-03-25 07:30:40: Dmwatcher process DMRW1 status switching [OPEN-->TAKEOVER]
[monitor] 2022-03-25 07:30:40: Switch dmwatcher DMRW1 to TAKEOVER status success
[monitor] 2022-03-25 07:30:40: Instance DMRW1 start to execute sql SP_SET_GLOBAL_DW_STATUS(0, 7)
[monitor] 2022-03-25 07:30:40: Instance DMRW1 execute sql SP_SET_GLOBAL_DW_STATUS(0, 7) success
[monitor] 2022-03-25 07:30:40: Instance DMRW1 start to execute sql SP_APPLY_KEEP_PKG()
[monitor] 2022-03-25 07:30:40: Instance DMRW1 execute sql SP_APPLY_KEEP_PKG() success
[monitor] 2022-03-25 07:30:40: Instance DMRW1 start to execute sql ALTER DATABASE MOUNT
[monitor] 2022-03-25 07:30:41: Instance DMRW1 execute sql ALTER DATABASE MOUNT success
[monitor] 2022-03-25 07:30:41: Instance DMRW1 start to execute sql ALTER DATABASE PRIMARY
[monitor] 2022-03-25 07:30:41: Instance DMRW1 execute sql ALTER DATABASE PRIMARY success
[monitor] 2022-03-25 07:30:41: Notify instance DMRW1 to change all arch status to be invalid
[monitor] 2022-03-25 07:30:41: Succeed to change all instances arch status to be invalid
[monitor] 2022-03-25 07:30:41: Instance DMRW1 start to execute sql ALTER DATABASE OPEN FORCE
[monitor] 2022-03-25 07:30:42: Instance DMRW1 execute sql ALTER DATABASE OPEN FORCE success
[monitor] 2022-03-25 07:30:42: Instance DMRW1 start to execute sql SP_SET_GLOBAL_DW_STATUS(7, 0)
[monitor] 2022-03-25 07:30:42: Instance DMRW1 execute sql SP_SET_GLOBAL_DW_STATUS(7, 0) success
[monitor] 2022-03-25 07:30:42: Notify dmwatcher(DMRW1) switch to OPEN status
[monitor] 2022-03-25 07:30:42: Dmwatcher process DMRW1 status switching [TAKEOVER-->OPEN]
[monitor] 2022-03-25 07:30:43: Switch dmwatcher DMRW1 to OPEN status success
[monitor] 2022-03-25 07:30:43: Notify group(GRP_RW)'s dmwatcher to do clear
[monitor] 2022-03-25 07:30:43: Clean request of dmwatcher processer DMRW1 success
[monitor] 2022-03-25 07:30:43: Clean request of dmwatcher processer DMRW2 success
[monitor] 2022-03-25 07:30:43: Success to takeover use instance DMRW1
[monitor] 2022-03-25 07:30:43: Group(GRP_RW) use instance DMRW1 auto takeover success
[monitor] 2022-03-25 07:30:56: Instance DMRW2[STANDBY, MOUNT, ISTAT_SAME:TRUE] recover to OK
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-03-25 07:30:56 STARTUP OK DMRW2 MOUNT STANDBY INVALID 8 34006 34006
[monitor] 2022-03-25 07:30:56: Dmwatcher process DMRW2 status switching [STARTUP-->UNIFY EP]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-03-25 07:30:56 UNIFY EP OK DMRW2 MOUNT STANDBY INVALID 8 34006 34006
[monitor] 2022-03-25 07:30:56: Dmwatcher process DMRW2 status switching [UNIFY EP-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-03-25 07:30:56 OPEN OK DMRW2 OPEN STANDBY INVALID 8 34006 34006
[monitor] 2022-03-25 07:30:59: Dmwatcher process DMRW1 status switching [OPEN-->RECOVERY]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-03-25 07:30:59 RECOVERY OK DMRW1 OPEN PRIMARY VALID 9 35364 35364
[monitor] 2022-03-25 07:31:01: Dmwatcher process DMRW1 status switching [RECOVERY-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-03-25 07:31:01 OPEN OK DMRW1 OPEN PRIMARY VALID 9 35364 35364
查看进行,node2上的dmserver被拉起
[root@node2 ~]# kill -9 3040
[root@node2 ~]# ps -ef | grep dm
root 918 1 0 01:17 ? 00:00:00 rpc.idmapd
dmdba 1167 1 0 01:17 ? 00:00:02 /home/dmdba/dmdbms/bin/dmap
root 2847 2348 0 05:51 pts/0 00:00:00 su - dmdba
dmdba 2848 2847 0 05:51 pts/0 00:00:00 -bash
root 2883 2869 0 05:52 pts/1 00:00:00 su - dmdba
dmdba 2884 2883 0 05:52 pts/1 00:00:00 -bash
dmdba 3109 2848 0 06:46 pts/0 00:00:00 disql SYSDBA/SYSDBA
root 3111 3023 0 06:47 pts/2 00:00:00 su - dmdba
dmdba 3112 3111 0 06:47 pts/2 00:00:00 -bash
dmdba 3129 3112 0 06:47 pts/2 00:00:02 dmwatcher /dmdata/DAMENG/dmwatcher.ini
dmdba 3176 1 0 07:30 ? 00:00:00 /home/dmdba/dmdbms/bin/dmserver /dmdata/DAMENG/dm.ini mount
root 3250 3160 0 07:33 pts/3 00:00:00 grep dm
node2变为备库
[dmdba@node2 ~]$ disql SYSDBA/SYSDBA
Server[LOCALHOST:5236]:mode is standby, state is open
login used time : 1.558(ms)
disql V8
node1变为主库
[dmdba@node1 ~]$ disql SYSDBA/SYSDBA
Server[LOCALHOST:5236]:mode is primary, state is open
login used time : 1.024(ms)
disql V8
4、模拟备库node2数据库被kill(环境 node1 为主库,node2 为备库)
监控log
[monitor] 2022-03-25 07:41:19: Instance DMRW2[STANDBY, OPEN, ISTAT_SAME:TRUE] error
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-03-25 07:41:18 OPEN ERROR DMRW2 OPEN STANDBY VALID 9 35367 35367
[monitor] 2022-03-25 07:41:20: Dmwatcher process DMRW2 status switching [OPEN-->STARTUP]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-03-25 07:41:20 STARTUP ERROR DMRW2 OPEN STANDBY VALID 9 35367 35367
[monitor] 2022-03-25 07:41:21: Dmwatcher process DMRW1 status switching [OPEN-->FAILOVER]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-03-25 07:41:21 FAILOVER OK DMRW1 SUSPEND PRIMARY VALID 9 35367 35367
[monitor] 2022-03-25 07:41:24: Dmwatcher process DMRW1 status switching [FAILOVER-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-03-25 07:41:24 OPEN OK DMRW1 OPEN PRIMARY VALID 9 35367 35367
[monitor] 2022-03-25 07:41:35: Instance DMRW2[STANDBY, MOUNT, ISTAT_SAME:TRUE] recover to OK
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-03-25 07:41:35 STARTUP OK DMRW2 MOUNT STANDBY INVALID 9 35367 35367
[monitor] 2022-03-25 07:41:36: Dmwatcher process DMRW2 status switching [STARTUP-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-03-25 07:41:36 OPEN OK DMRW2 OPEN STANDBY INVALID 9 35367 35367
[monitor] 2022-03-25 07:41:38: Dmwatcher process DMRW1 status switching [OPEN-->RECOVERY]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
2022-03-25 07:41:38 RECOVERY OK DMRW1 OPEN PRIMARY VALID 9 35367 35367
[monitor] 2022-03-25 07:41:41: Dmwatcher process DMRW1 status switching [RECOVERY-->OPEN]
WTIME WSTATUS INST_OK INAME ISTATUS IMODE RSTAT N_OPEN FLSN CLSN
- 2022-03-25 07:41:41 OPEN OK DMRW1 OPEN PRIMARY VALID 9 35367 35367
node2 watcher log:
Waitpid error!
file dm.key not found, use default license!
version info: develop
DM Database Server x64 V8 1-2-38-21.07.09-143359-10018-ENT startup...
Normal of FAST
Normal of DEFAULT
Normal of RECYCLE
Normal of KEEP
Normal of ROLL
Database mode = 2, oguid = 453331
License will expire on 2022-07-09
begin redo pwr log collect, last ckpt lsn: 33728 ...
redo pwr log collect finished
main rfil[/dmdata/DAMENG/DAMENG01.log]'s grp collect 0 valid pwr record, discard 1117 invalid pwr record
EP[0]'s cur_lsn[35367], file_lsn[35367]
begin redo log recover, last ckpt lsn: 33728 ...
redo log recover finished
ndct db load finished
ndct fill fast pool finished
nsvr_startup end.
aud sys init success.
aud rt sys init success.
systables desc init success.
ndct_db_load_info success.
SYSTEM IS READY.
node2 dmserver被拉起