故障处理:ORA-01173的模拟与故障处理

阅读 13

09-04 15:00


案例:ORA-01173的模拟与故障处理

案例来至于一位网友,他们生产数据库异常,在drop表空间,重建控制文件后,报下面的错误:

Sat Jul 19 00:45:47 2014

SMON: enabling cache recovery

Sat Jul 19 00:45:47 2014

Errors in file /oracle/app/oracle/admin/orcl1024/udump/orcl1024_ora_12464.trc:

ORA-01173: data dictionary indicates missing data file from system tablespace

Sat Jul 19 00:45:47 2014

Error 1173 happened during db open, shutting down database

USER: terminating instance due to error 1173

Instance terminated by USER, pid = 12464

ORA-1092 signalled during: alter database open resetlogs…

下面是简单的测试一下,提供2种方法来解决此故障。

欢迎大家加入ORACLE超级群:17115662 免费解决各种ORACLE问题,以后BLOG将迁移到http://www.htz.pw

1,数据库版本

www.htz.pw > select * from v$version where rownum<3;

BANNER

—————————————————————-

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 – 64bi

PL/SQL Release 10.2.0.4.0 – Production

www.htz.pw > !lsb_release -a

LSB Version:    :core-3.0-amd64:core-3.0-ia32:core-3.0-noarch:graphics-3.0-amd64:graphics-3.0-ia32:graphics-3.0-noarch

Distributor ID: RedHatEnterpriseAS

Description:    Red Hat Enterprise Linux AS release 4 (Nahant Update 8)

Release:        4

Codename:       NahantUpdate8

2,查询undo段的名字

因为在实验过程中,我们需要使用到undo段的名字,所以这里提前查询出来,如果在生产环境,我们可以直接使用bbed去查询undo$表,或者是使用odu,dul等工具去直接抽取undo$表,另外了可以使用strings system数据文件来过滤UNDO段。

www.htz.pw > @undo_segment.sql                                                
                                                    SEGMENT_HEADER                          

TABLESPACE           SEGMENT_NAME                   FILE#.BLOCK    STATUS     SEGMENT_SIZE(M)

——————– —————————— ————– ———- —————

SYSTEM.OLD           PRI.SYSTEM                     1.9            ONLINE                   0

UNDOTBS1.CURRENT     PUB._SYSSMU1$                  2.9            ONLINE                   1

UNDOTBS1.CURRENT     PUB._SYSSMU10$                 2.153          ONLINE                   1

UNDOTBS1.CURRENT     PUB._SYSSMU9$                  2.137          ONLINE                  13

UNDOTBS1.CURRENT     PUB._SYSSMU8$                  2.121          ONLINE                  18

UNDOTBS1.CURRENT     PUB._SYSSMU7$                  2.105          ONLINE                   0

UNDOTBS1.CURRENT     PUB._SYSSMU6$                  2.89           ONLINE                   6

UNDOTBS1.CURRENT     PUB._SYSSMU5$                  2.73           ONLINE                   2

UNDOTBS1.CURRENT     PUB._SYSSMU4$                  2.57           ONLINE                   0

UNDOTBS1.CURRENT     PUB._SYSSMU3$                  2.41           ONLINE                   1

UNDOTBS1.CURRENT     PUB._SYSSMU2$                  2.25           ONLINE                   0

3,生成创建控制文件脚本

[oracle@www.htz.pw sql]$./create_controlfile_sql.sh

please input direcotry default /tmp:

please input file name default control.ctl:

Database altered.

这里生成的默认文件位置在/tmp/control.ctl

### 4,重建控制文件

www.htz.pw > shutdown abort;

ORACLE instance shut down.

STARTUP NOMOUNT

CREATE CONTROLFILE REUSE DATABASE "ORCL1024" NORESETLOGS  NOARCHIVELOG

MAXLOGFILES 16

MAXLOGMEMBERS 3

MAXDATAFILES 100

MAXINSTANCES 8

MAXLOGHISTORY 292

LOGFILE

GROUP 1 ‘/oracle/app/oracle/oradata/orcl1024/redo01.log’  SIZE 50M,

GROUP 2 ‘/oracle/app/oracle/oradata/orcl1024/redo02.log’  SIZE 50M,

GROUP 3 ‘/oracle/app/oracle/oradata/orcl1024/redo03.log’  SIZE 50M

DATAFILE

‘/oracle/app/oracle/oradata/orcl1024/system01.dbf’,

‘/oracle/app/oracle/oradata/orcl1024/undotbs01.dbf’,(需要删除这行)

‘/oracle/app/oracle/oradata/orcl1024/sysaux01.dbf’,

‘/oracle/app/oracle/oradata/orcl1024/users01.dbf’

CHARACTER SET ZHS16GBK

;

RECOVER DATABASE

ALTER DATABASE OPEN;

ALTER TABLESPACE TEMP ADD TEMPFILE ‘/oracle/app/oracle/oradata/orcl1024/temp01.dbf’

 SIZE 1482M REUSE AUTOEXTEND ON NEXT 655360  MAXSIZE 32767M;

5,故障现象出现

www.htz.pw > recover database using backup controlfile until cancel;

ORA-00279: change 2170641 generated at 07/19/2014 00:35:54 needed for thread 1

ORA-00289: suggestion :

/oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_72

_%u_.arc

ORA-00280: change 2170641 for thread 1 is in sequence #72

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

/oracle/app/oracle/oradata/orcl1024/redo02.log

ORA-00310: archived log contains sequence 71; sequence 72 required

ORA-00334: archived log: ‘/oracle/app/oracle/oradata/orcl1024/redo02.log’

ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below

ORA-01194: file 1 needs more recovery to be consistent

ORA-01110: data file 1: ‘/oracle/app/oracle/oradata/orcl1024/system01.dbf’

www.htz.pw >  recover database using backup controlfile until cancel;

ORA-00279: change 2170641 generated at 07/19/2014 00:35:54 needed for thread 1

ORA-00289: suggestion :

/oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_72

_%u_.arc

ORA-00280: change 2170641 for thread 1 is in sequence #72


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

/oracle/app/oracle/oradata/orcl1024/redo03.log

Log applied.

Media recovery complete.


www.htz.pw > alter database open resetlogs;

alter database open resetlogs

*

ERROR at line 1:

ORA-01092: ORACLE instance terminated. Disconnection forced

alert中出现下面的报错

Sat Jul 19 00:45:47 2014

SMON: enabling cache recovery

Sat Jul 19 00:45:47 2014

Errors in file /oracle/app/oracle/admin/orcl1024/udump/orcl1024_ora_12464.trc:

ORA-01173: data dictionary indicates missing data file from system tablespace

Sat Jul 19 00:45:47 2014

Error 1173 happened during db open, shutting down database

USER: terminating instance due to error 1173

Instance terminated by USER, pid = 12464

ORA-1092 signalled during: alter database open resetlogs…

6,故障处理方法1

在运气比较好的情况下使用此方案是可行的,朋友的数据库使用此方法,数据库能正常的OPEN。

6.1 修改参数文件

这里手动创建pfile文件,直接修改pfile文件比较简单,并且不影响原spfile文件,增加下面红色部分参数

www.htz.pw > !vi /tmp/123.ora

orcl1024.__db_cache_size=54525952

orcl1024.__java_pool_size=4194304

orcl1024.__large_pool_size=8388608

orcl1024.__shared_pool_size=88080384

orcl1024.__streams_pool_size=0

*._backup_ksfq_bufsz=1048576

*._log_parallelism=2

*._log_parallelism_max=4

*._pga_max_size=5368709120

*._smm_max_size=3145728

*.audit_file_dest=’/oracle/app/oracle/admin/orcl1024/adump’

*.background_dump_dest=’/oracle/app/oracle/admin/orcl1024/bdump’

*.compatible=’10.2.0.3.0′

*.control_files=’/oracle/app/oracle/oradata/orcl1024/control01.ctl’,’/oracle/app/oracle/oradata/orcl1024/control02.ctl’,’/oracle/app/oracle/oradata/orcl1024/control03.ctl’

*.core_dump_dest=’/oracle/app/oracle/admin/orcl1024/cdump’

*.cpu_count=3

*.db_block_size=8192

*.db_domain=”

*.db_file_multiblock_read_count=16

*.db_name=’orcl1024′

*.db_recovery_file_dest=’/oracle/app/oracle/flash_recovery_area’

*.db_recovery_file_dest_size=4294967296

*.dbwr_io_slaves=4

*.disk_asynch_io=FALSE

*.dispatchers='(PROTOCOL=TCP) (SERVICE=orcl1024XDB)’

*.event=”

*.job_queue_processes=10

*.open_cursors=300

*.pga_aggregate_target=1073741824

*.processes=150

*.recyclebin=’OFF’

*.remote_login_passwordfile=’EXCLUSIVE’

*.sga_target=167772160

#*.undo_management=’AUTO’

*.undo_management=’manual’

*.undo_tablespace=’UNDOTBS1′

*.user_dump_dest=’/oracle/app/oracle/admin/orcl1024/udump’

_corrupted_rollback_segments=(_SYSSMU1$,_SYSSMU2$,_SYSSMU3$,_SYSSMU4$,_SYSSMU5$,_SYSSMU6$,_SYSSMU7$,_SYSSMU8$,_SYSSMU9$,_SYSSMU10$)

这里通常还需要增加下面的2个参数

_allow_resetlogs_corruption=true

_allow_error_simulation=true

另外还可以会增加一个event,如果smon一些功能的event。

6.2 启动数据库

www.htz.pw > startup mount pfile=’/tmp/123.ora’;

ORACLE instance started.

Total System Global Area  167772160 bytes

Fixed Size                  2082432 bytes

Variable Size             100665728 bytes

Database Buffers           54525952 bytes

Redo Buffers               10498048 bytes

Database mounted.

www.htz.pw > recover database using backup controlfile until cancel;

ORA-00279: change 2171386 generated at 07/19/2014 00:45:47 needed for thread 1

ORA-00289: suggestion :

/oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_1_

%u_.arc

ORA-00280: change 2171386 for thread 1 is in sequence #1


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

/oracle/app/oracle/oradata/orcl1024/redo03.log

ORA-00339: archived log does not contain any redo

ORA-00334: archived log: ‘/oracle/app/oracle/oradata/orcl1024/redo03.log’

 
ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below

ORA-01194: file 1 needs more recovery to be consistent

ORA-01110: data file 1: ‘/oracle/app/oracle/oradata/orcl1024/system01.dbf’


www.htz.pw > recover database using backup controlfile until cancel;

ORA-00279: change 2171386 generated at 07/19/2014 00:45:47 needed for thread 1

ORA-00289: suggestion :

/oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_1_

%u_.arc

ORA-00280: change 2171386 for thread 1 is in sequence #1

 
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

/oracle/app/oracle/oradata/orcl1024/redo01.log

Log applied.

Media recovery complete.

www.htz.pw > alter database open resetlogs;

Database altered.

这里看到数据库已经正常打开,这里还需要注意观察,alert日志文件是否有异常报错。

6.3 重建undo表空间

www.htz.pw > !rm /oracle/app/oracle/oradata/orcl1024/undotbs01.dbf

www.htz.pw >  create undo tablespace undotbs1 datafile ‘/oracle/app/oracle/oradata/orcl1024/undotbs01.dbf’ size 10m;

Tablespace created.

6.4 使用源参数文件启动数据库

www.htz.pw > shutdown immediate;

Database closed.

Database dismounted.

ORACLE instance shut down.

www.htz.pw > startup

ORACLE instance started.


Total System Global Area  167772160 bytes

Fixed Size                  2082432 bytes

Variable Size             100665728 bytes

Database Buffers           54525952 bytes

Redo Buffers               10498048 bytes

Database mounted.

Database opened.

www.htz.pw >

数据库启动正常,注意观察alert日志中是否有报错。

7,故障处理方法2

使用此方法,要求原来的UNDO数据文件存在,此方法就是将原来的undo数据文件再次增加到控制文件中去,此方法比较复制,因为在开启数据库的都会遇到其它很多的一些问题。

7.1 故障现象重现

www.htz.pw > select open_mode from v$database;

OPEN_MODE

———-

READ WRITE

数据库的状态是正常的

www.htz.pw > select name from v$dbfile;

NAME

——————————————————————————–

/oracle/app/oracle/oradata/orcl1024/users01.dbf

/oracle/app/oracle/oradata/orcl1024/sysaux01.dbf

/oracle/app/oracle/oradata/orcl1024/system01.dbf

/oracle/app/oracle/oradata/orcl1024/undotbs01.dbf

存在的数据文件

www.htz.pw > shutdown abort;

ORACLE instance shut down.

重建控制文件,控制文件中不包括undo表空间

www.htz.pw > @/tmp/control.ctl

ORACLE instance started.

Total System Global Area  167772160 bytes

Fixed Size                  2082432 bytes

Variable Size             100665728 bytes

Database Buffers           54525952 bytes

Redo Buffers               10498048 bytes

Control file created.

ORA-00283: recovery session canceled due to errors

ORA-01610: recovery using the BACKUP CONTROLFILE option must be done


ALTER DATABASE OPEN

*

ERROR at line 1:

ORA-01589: must use RESETLOGS or NORESETLOGS option for database open



ALTER TABLESPACE TEMP ADD TEMPFILE ‘/oracle/app/oracle/oradata/orcl1024/temp01.dbf’

*

ERROR at line 1:

ORA-01109: database not open

www.htz.pw >

www.htz.pw > recover database using backup controlfile until cancel;

ORA-00279: change 2171941 generated at 07/19/2014 00:51:38 needed for thread 1

ORA-00289: suggestion :

/oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_1_

%u_.arc

ORA-00280: change 2171941 for thread 1 is in sequence #1

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

cancel

ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below

ORA-01194: file 1 needs more recovery to be consistent

ORA-01110: data file 1: ‘/oracle/app/oracle/oradata/orcl1024/system01.dbf’


ORA-01112: media recovery not started


*.remote_login_passwordfile=’EXCLUSIVE’

*.sga_target=167772160

#*.undo_management=’AUTO’

*.undo_management=’manual’

*.undo_tablespace=’UNDOTBS1′

*.user_dump_dest=’/oracle/app/oracle/admin/orcl1024/udump’

_allow_resetlogs_corruption=true

_allow_error_simulation=true

增加上面的参数文件

www.htz.pw > startup force mount pfile=’/tmp/123.ora’;

ORACLE instance started.

Total System Global Area  167772160 bytes

Fixed Size                  2082432 bytes

Variable Size             100665728 bytes

Database Buffers           54525952 bytes

Redo Buffers               10498048 bytes

Database mounted.

www.htz.pw > recover database using backup controlfile until cancel;

ORA-00279: change 2171941 generated at 07/19/2014 00:51:38 needed for thread 1

ORA-00289: suggestion :

/oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_1_

%u_.arc

ORA-00280: change 2171941 for thread 1 is in sequence #1

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

/oracle/app/oracle/oradata/orcl1024/redo01.log

Log applied.

Media recovery complete.

www.htz.pw >


www.htz.pw > alter database open resetlogs;

alter database open resetlogs

*

ERROR at line 1:

ORA-01092: ORACLE instance terminated. Disconnection forced

重现故障现在

Errors in file /oracle/app/oracle/admin/orcl1024/udump/orcl1024_ora_14960.trc:

ORA-00704: bootstrap process failure

ORA-00604: error occurred at recursive SQL level 2

ORA-01173: data dictionary indicates missing data file from system tablespace

Sat Jul 19 01:01:40 2014

Error 704 happened during db open, shutting down database

USER: terminating instance due to error 704

Instance terminated by USER, pid = 14960

ORA-1092 signalled during: alter database open resetlogs…

7.2 重建控制文件

重建控制文件,控制文件中包括undo表空间的数据文件

[oracle@www.htz.pw sql]$sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 – Production on Sat Jul 19 01:09:05 2014

Copyright (c) 1982, 2007, Oracle.  All Rights Reserved.

Connected to an idle instance.

www.htz.pw > startup nomount pfile=’/tmp/123.ora’;

ORACLE instance started.

Total System Global Area  167772160 bytes

Fixed Size                  2082432 bytes

Variable Size             100665728 bytes

Database Buffers           54525952 bytes

Redo Buffers               10498048 bytes

www.htz.pw > @/tmp/control.ctl

ORA-01081: cannot start already-running ORACLE – shut it down first

CREATE CONTROLFILE REUSE DATABASE "ORCL1024" RESETLOGS  NOARCHIVELOG

*

ERROR at line 1:

ORA-01503: CREATE CONTROLFILE failed

ORA-01189: file is from a different RESETLOGS than previous files

ORA-01110: data file 2: ‘/oracle/app/oracle/oradata/orcl1024/undotbs01.dbf’

这里提示ORA-01189的错误。

1189的错误很简单,因为数据文件头的resetlogs信息不一致导致的。

7.3 bbed修改数据文件头中RESETLOG与SCN信息

www.htz.pw > !cat /tmp/bbed.par

listfile=/tmp/bbed.datafile

www.htz.pw > !cat /tmp/bbed.datafile

1 /oracle/app/oracle/oradata/orcl1024/system01.dbf

2 /oracle/app/oracle/oradata/orcl1024/undotbs01.dbf

3 /oracle/app/oracle/oradata/orcl1024/sysaux01.dbf

4 /oracle/app/oracle/oradata/orcl1024/users01.dbf

[oracle@www.htz.pw ~]$bbed parfile=/tmp/bbed.par

Password:


BBED: Release 2.0.0.0.0 – Limited Production on Sat Jul 19 01:12:13 2014

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

************* !!! For Oracle Internal Use only !!! ***************

BBED> info

 File#  Name                                                        Size(blks)

 —–  —-                                                        ———-

     1  /oracle/app/oracle/oradata/orcl1024/system01.dbf                     0

     2  /oracle/app/oracle/oradata/orcl1024/undotbs01.dbf                    0

     3  /oracle/app/oracle/oradata/orcl1024/sysaux01.dbf                     0

     4  /oracle/app/oracle/oradata/orcl1024/users01.dbf                      0

这里只需要修改上次resetlogs与SCN的值就可以了

BBED> assign file 2 block 1 offset 112 = file 1 block 1 offset 112;

Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y

ub4 kcvfhrlc                                @112      0x32dc2c73

BBED> assign file 2 block 1 offset 116 = file 1 block 1 offset 116;

ub4 kscnbas                                 @116      0x002125e0


BBED> assign file 2 block 1 offset 484 = file 1 block 1 offset 484;

ub1 pad                                     @484      0xe1


BBED> assign file 2 block 1 offset 492 = file 1 block 1 offset 492;

ub1 pad                                     @492      0x74

BBED> sum apply dba 2,1

Check value for File 2, Block 1:

current = 0x093b, required = 0x093b


www.htz.pw > startup force nomount pfile=’/tmp/123.ora’;

ORACLE instance started.

Total System Global Area  167772160 bytes

Fixed Size                  2082432 bytes

Variable Size             100665728 bytes

Database Buffers           54525952 bytes

Redo Buffers               10498048 bytes

7.4 重建控制文件

www.htz.pw > @/tmp/control.ctl

ORA-01081: cannot start already-running ORACLE – shut it down first

Control file created.

ORA-00283: recovery session canceled due to errors

ORA-01610: recovery using the BACKUP CONTROLFILE option must be done

控制文件重建成功

7.5 遇到600错误

www.htz.pw > recover database using backup controlfile until cancel;

ORA-00279: change 2172129 generated at 07/19/2014 00:53:08 needed for thread 1

ORA-00289: suggestion :

/oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_1_

%u_.arc

ORA-00280: change 2172129 for thread 1 is in sequence #1
 

Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

cancel

ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below

ORA-01194: file 1 needs more recovery to be consistent

ORA-01110: data file 1: ‘/oracle/app/oracle/oradata/orcl1024/system01.dbf’


ORA-01112: media recovery not started


www.htz.pw > alter database open resetlogs;

alter database open resetlogs

*

ERROR at line 1:

ORA-01092: ORACLE instance terminated. Disconnection forced

ORA-00704: bootstrap process failure

ORA-00704: bootstrap process failure

ORA-00600: internal error code, arguments: [kcbgtcr_13], [], [], [], [], [],[], []

这里触发了ORA-00600 kcbgtcr_13错误,只需要手动提交事务就可以了。

7.6 手动提交事务信息

[09:50:33]www.htz.pw > startup mount pfile=’/tmp/123.ora’;

[09:50:34]ORACLE instance started.

[09:50:34]

[09:50:34]Total System Global Area  167772160 bytes

[09:50:34]Fixed Size                  2082432 bytes

[09:50:34]Variable Size             100665728 bytes

[09:50:34]Database Buffers           54525952 bytes

[09:50:34]Redo Buffers               10498048 bytes

[09:50:38]Database mounted.

[09:50:51]www.htz.pw > recover database using backup controlfile until cancel;

[09:50:51]ORA-00279: change 2172135 generated at 07/19/2014 01:41:17 needed for thread 1

[09:50:51]ORA-00289: suggestion :

[09:50:51]/oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_1_

[09:50:51]%u_.arc

[09:50:51]ORA-00280: change 2172135 for thread 1 is in sequence #1

[09:50:51]

[09:50:51]

[09:50:51]Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

[09:50:53]cancel

[09:50:54]ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below

[09:50:54]ORA-01194: file 1 needs more recovery to be consistent

[09:50:54]ORA-01110: data file 1: ‘/oracle/app/oracle/oradata/orcl1024/system01.dbf’

[09:50:54]

[09:50:54]

[09:50:54]ORA-01112: media recovery not started

[09:50:54]

[09:50:54]

[09:51:06]www.htz.pw > alter database open resetlogs;

[09:51:09]alter database open resetlogs

[09:51:09]*

[09:51:09]ERROR at line 1:

[09:51:09]ORA-01092: ORACLE instance terminated. Disconnection forced

后面alert报下面的错误

Errors in file /oracle/app/oracle/admin/orcl1024/udump/orcl1024_ora_32544.trc:

ORA-00600: internal error code, arguments: [kcbgtcr_13], [], [], [], [], [], [], []

Sat Jul 19 01:43:01 2014

Errors in file /oracle/app/oracle/admin/orcl1024/udump/orcl1024_ora_32544.trc:

ORA-00704: bootstrap process failure

ORA-00704: bootstrap process failure

ORA-00600: internal error code, arguments: [kcbgtcr_13], [], [], [], [], [], [], []

Sat Jul 19 01:43:01 2014

Error 704 happened during db open, shutting down database

在trace文件中查看有那些块没有提交。

[oracle@www.htz.pw ~]$grep -E ‘^Block header dump|^0x0’  /oracle/app/oracle/admin/orcl1024/udump/orcl1024_ora_32544.trc

0x01 0x1f64     0x02 0x1ef8    

0x01 0x1f64     0x02 0x1ef8    

0x01 0x1f64     0x02 0x1ef8    

0x01 0x1f64     0x02 0x1ef8    

0x01 0x1f64     0x02 0x1ef8    

0x01 0x1f64     0x02 0x1ef8    

Block header dump:  0x0040007a

0x01   0x0003.001.00000191  0x0080002b.014c.03  —-    1  fsc 0x0000.00000000

Block header dump:  0x0040017c

0x01   0x0000.022.00000002  0x00400196.0004.37  –U-   12  fsc 0x0000.00000147

Block header dump:  0x004000da

0x01   0x0004.00c.0000011d  0x0080559d.00d3.02  C—    0  scn 0x0000.0008ab18

Block header dump:  0x004000db

0x01   0x0008.017.00000002  0x00800080.0000.01  CBU-    0  scn 0x0000.00002404

0x02   0x0004.01a.0000017a  0x0080003c.016b.32  –U-    1  fsc 0x000e.001e8de0

Block header dump:  0x0040007a

0x01   0x0003.001.00000191  0x0080002b.014c.03  —-    1  fsc 0x0000.00000000

Block header dump:  0x0040006a

0x01   0x0000.008.00000034  0x0040019e.003b.07  C—    0  scn 0x0000.0021245c

通过10046跟踪报错的SQL语句

ksedmp: internal or fatal error

ORA-00600: internal error code, arguments: [kcbgtcr_13], [], [], [], [], [], [], []

Current SQL statement for this session:

select ctime, mtime, stime from obj$ where obj# = :1

—– Call Stack Trace —–

这里看到了报错的SQL语句,以SQL语句来搜索,直到搜索到如下的

Cursor#5(0x2a97ca18b0) state=FETCH curiob=0x2a97cba468

 curflg=f fl2=0 par=0x2a97ca1710 ses=0x69f82a30

 sqltxt(0x69a944b0)=select ctime, mtime, stime from obj$ where obj# = :1

  hash=fa0bd3f60d6ee4f2495f9af8199b75b9

  parent=0x6677c4b8 maxchild=01 plk=0x66f56af0 ppn=n

cursor instantiation=0x2a97cba468 used=1405705379

 child#0(0x69a94288) pcs=0x6677c0c8

  clk=0x66f56dd0 ci=0x6677b7b0 pn=0x69ad37f0 ctx=0x6616fe90

 kgsccflg=0 llk[0x2a97cba470,0x2a97cba470] idx=0

 xscflg=e0141476 fl2=45000401 fl3=4022210c fl4=100

 Bind bytecodes

  Opcode = 1   Unoptimized

  Offsi = 48, Offsi = 0

kkscoacd

 Bind#0

  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00

  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0

  kxsbbbfp=2a97cba020  bln=22  avl=02  flg=05

  value=20

这里中可以看到绑定变量的值是20.在相同的版本其它的数据库中执行下面的操作

SQL> select rowid  from obj$ where obj# =20;

ROWID

——————

AAAAASAABAAAAB6AAA


SQL> @rowid_to_info.sql

Enter value for rowid: AAAAASAABAAAAB6AAA

ROWID_TYPE:  1

OBJECT_NUMBER:  18

RELATIVE_FNO:  1

BLOCK_NUMBER:  122

ROW_NUMBER:  0

PL/SQL procedure successfully completed.


正在好trace文件中的

Block header dump:  0x0040007a

0x01   0x0003.001.00000191  0x0080002b.014c.03  —-    1  fsc 0x0000.00000000

其实我们还可以从10046trace文件中找到此信息如下:

=====================

PARSING IN CURSOR #5 len=52 dep=1 uid=0 oct=3 lid=0 tim=1372757926978214 hv=429618617 ad=’69a944b0′

select ctime, mtime, stime from obj$ where obj# = :1

END OF STMT

PARSE #5:c=0,e=234,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1372757926978212

BINDS #5:

kkscoacd

 Bind#0

  oacdty=02 mxl=22(22) mxlc=00 mal=00 scl=00 pre=00

  oacflg=08 fl2=0001 frm=00 csi=00 siz=24 off=0

  kxsbbbfp=2a97cba020  bln=22  avl=02  flg=05

  value=20

EXEC #5:c=0,e=330,p=0,cr=0,cu=0,mis=1,r=0,dep=1,og=4,tim=1372757926978586

WAIT #5: nam=’db file sequential read’ ela= 23 file#=1 block#=218 blocks=1 obj#=-1 tim=1372757926978753

WAIT #5: nam=’db file sequential read’ ela= 9 file#=1 block#=219 blocks=1 obj#=-1 tim=1372757926978804

WAIT #5: nam=’db file sequential read’ ela= 7 file#=1 block#=122 blocks=1 obj#=-1 tim=1372757926978841

这里需要注意的是绑定变量的值。

在trace文件中可以发现下面的内容

tab 0, row 26, @0x18f1

tl: 70 fb: –H-FL– lb: 0x1  cc: 17

col  0: [ 2]  c1 02

col  1: [ 4]  c3 06 17 08

col  2: [ 1]  80

col  3: [12]  5f 4e 45 58 54 5f 4f 42 4a 45 43 54

col  4: [ 2]  c1 02

col  5: *NULL*

col  6: [ 1]  80

col  7: [ 7]  78 6c 03 0c 01 28 31

col  8: [ 7]  78 72 07 13 01 3b 01

col  9: [ 7]  78 6c 03 0c 01 28 31

col 10: [ 1]  80

col 11: *NULL*

col 12: *NULL*

col 13: [ 1]  80

col 14: *NULL*

col 15: [ 1]  80

col 16: [ 4]  c3 07 38 24

bbed手动提交事务,需要更改itl与行中的lck值

BBED> p ktbbh

struct ktbbh, 48 bytes                      @20     

   ub1 ktbbhtyp                             @20       0x01 (KDDBTDATA)

   union ktbbhsid, 4 bytes                  @24     

      ub4 ktbbhsg1                          @24       0x00000012

      ub4 ktbbhod1                          @24       0x00000012

   struct ktbbhcsc, 8 bytes                 @28     

      ub4 kscnbas                           @28       0x0021251b

      ub2 kscnwrp                           @32       0x0000

   b2 ktbbhict                              @36       1

   ub1 ktbbhflg                             @38       0x02 (NONE)

   ub1 ktbbhfsl                             @39       0x00

   ub4 ktbbhfnx                             @40       0x00000000

   struct ktbbhitl[0], 24 bytes             @44     

      struct ktbitxid, 8 bytes              @44     

         ub2 kxidusn                        @44       0x0003

         ub2 kxidslt                        @46       0x0001

         ub4 kxidsqn                        @48       0x00000191

      struct ktbituba, 8 bytes              @52     

         ub4 kubadba                        @52       0x0080002b

         ub2 kubaseq                        @56       0x014c

         ub1 kubarec                        @58       0x03

      ub2 ktbitflg                          @60       0x0001 (NONE)

      union _ktbitun, 2 bytes               @62     

         b2 _ktbitfsc                       @62       0

         ub2 _ktbitwrp                      @62       0x0000

      ub4 ktbitbas                          @64       0x00000000

BBED> modify /x 80 offset 61

Warning: contents of previous BIFILE will be lost. Proceed? (Y/N) y

 File: /oracle/app/oracle/oradata/orcl1024/system01.dbf (1)

 Block: 122              Offsets:   61 to   70           Dba:0x0040007a

————————————————————————

 80000000 00000000 016c

BBED> x /rn *kdbr[26]

rowdata[5278]                               @6453   

————-

flag@6453: 0x2c (KDRHFL, KDRHFF, KDRHFH)

lock@6454: 0x01

cols@6455:   17

 

col    0[2] @6456: 1

col    1[4] @6459: 52207

col    2[1] @6464: 0

col   3[12] @6466: -0

col    4[2] @6479: 1

col    5[0] @6482: *NULL*

col    6[1] @6483: 0

col    7[7] @6485: -0

col    8[7] @6493: -0

col    9[7] @6501: -0

col   10[1] @6509: 0

col   11[0] @6511: *NULL*

col   12[0] @6512: *NULL*

col   13[1] @6513: 0

col   14[0] @6515: *NULL*

col   15[1] @6516: 0

col   16[4] @6518: 65535

BBED> modify /x 00 offset 6454

 File: /oracle/app/oracle/oradata/orcl1024/system01.dbf (1)

 Block: 122              Offsets: 6454 to 6463           Dba:0x0040007a

————————————————————————

 001102c1 0204c306 1708

BBED> sum apply

Check value for File 1, Block 122:

current = 0x3d20, required = 0x3d20
 

BBED> verify

DBVERIFY – Verification starting

FILE = /oracle/app/oracle/oradata/orcl1024/system01.dbf

BLOCK = 122


DBVERIFY – Verification complete


Total Blocks Examined         : 1

Total Blocks Processed (Data) : 1

Total Blocks Failing   (Data) : 0

Total Blocks Processed (Index): 0

Total Blocks Failing   (Index): 0

Total Blocks Empty            : 0

Total Blocks Marked Corrupt   : 0

Total Blocks Influx           : 0

7.7 报00600坏块的错误

www.htz.pw > alter database open resetlogs;

alter database open resetlogs

*

ERROR at line 1:

ORA-01092: ORACLE instance terminated. Disconnection forced

 

Sat Jul 19 01:56:37 2014

Errors in file /oracle/app/oracle/admin/orcl1024/udump/orcl1024_ora_1894.trc:

ORA-00604: error occurred at recursive SQL level 1

ORA-00607: Internal error occurred while making a change to a data block

ORA-00600: internal error code, arguments: [kddummy_blkchk], [1], [106], [6101], [], [], [], []

Error 604 happened during db open, shutting down database

USER: terminating instance due to error 604

Instance terminated by USER, pid = 1894

ORA-1092 signalled during: alter database open resetlogs…

这里可以看到数据文件1,块106,出现了6101的错误。此错误由于是ITL中的值与LOCK不一致导致的。

bbed修改行的lock值

BBED> set dba 1,106

        DBA             0x0040006a (4194410 1,106)

BBED> verify

DBVERIFY – Verification starting

FILE = /oracle/app/oracle/oradata/orcl1024/system01.dbf

BLOCK = 106


Block Checking: DBA = 4194410, Block Type = KTB-managed data block

data header at 0x2a97696244

kdbchk: row locked by non-existent transaction

        table=0   slot=10

        lockid=1   ktbbhitc=1

Block 106 failed with check code 6101


DBVERIFY – Verification complete


Total Blocks Examined         : 1

Total Blocks Processed (Data) : 1

Total Blocks Failing   (Data) : 1

Total Blocks Processed (Index): 0

Total Blocks Failing   (Index): 0

Total Blocks Empty            : 0

Total Blocks Marked Corrupt   : 0

Total Blocks Influx           : 0

此报错的修改见6101(row locked by non-existent transaction)

7.8 启动数据库

通过上面几步操作,再次启动数据库

www.htz.pw > startup mount pfile=’/tmp/123.ora’;

ORACLE instance started.

Total System Global Area  167772160 bytes

Fixed Size                  2082432 bytes

Variable Size             100665728 bytes

Database Buffers           54525952 bytes

Redo Buffers               10498048 bytes

Database mounted.

www.htz.pw > recover database using backup controlfile unitl cancel;

ORA-00905: missing keyword


www.htz.pw > recover database using backup controlfile until cancel;

ORA-00279: change 2172139 generated at 07/19/2014 01:56:34 needed for thread 1

ORA-00289: suggestion :

/oracle/app/oracle/flash_recovery_area/ORCL1024/archivelog/2014_07_19/o1_mf_1_1_

%u_.arc

ORA-00280: change 2172139 for thread 1 is in sequence #1


Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

cancel

ORA-01547: warning: RECOVER succeeded but OPEN RESETLOGS would get error below

ORA-01194: file 1 needs more recovery to be consistent

ORA-01110: data file 1: ‘/oracle/app/oracle/oradata/orcl1024/system01.dbf’


ORA-01112: media recovery not started

www.htz.pw > alter database open resetlogs;

Database altered.

使用原参数能正常启动数据库。

www.htz.pw > startup force;

ORACLE instance started.

Total System Global Area  167772160 bytes

Fixed Size                  2082432 bytes

Variable Size             100665728 bytes

Database Buffers           54525952 bytes

Redo Buffers               10498048 bytes

Database mounted.

Database opened.

------------------作者介绍-----------------------

姓名:黄廷忠

现就职:Oracle中国高级服务团队

曾就职:OceanBase、云和恩墨、东方龙马等

精彩评论(0)

0 0 举报