一套HP-UX上的10.2.0.4系统出现ORA-00600[17175] Oracle600内部错误,相关的日志信息如下:
Wed Dec 1 01:57:55 2010Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_pmon_3250.trc:ORA-00600: internal error code, arguments: [17175], [255], [], [], [], [], [], []ORA-00601: cleanup lock conflictWed Dec 1 01:57:57 2010Trace dumping is performing id=[cdmp_20101201015757]Wed Dec 1 01:58:05 2010LGWR: terminating instance due to error 472Wed Dec 1 01:58:05 2010Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lms1_3291.trc:ORA-00472: PMON process terminated with errorWed Dec 1 01:58:05 2010Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lms2_3293.trc:ORA-00472: PMON process terminated with errorWed Dec 1 01:58:05 2010Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lms3_3295.trc:ORA-00472: PMON process terminated with errorWed Dec 1 01:58:05 2010Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lms0_3289.trc:ORA-00472: PMON process terminated with errorWed Dec 1 01:58:05 2010Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lmon_3283.trc:ORA-00472: PMON process terminated with errorWed Dec 1 01:58:05 2010Errors in file /u01/app/oracle/admin/xgp2/bdump/xgp21_lmd0_3287.trc:ORA-00472: PMON process terminated with errorWed Dec 1 01:58:05 2010Shutting down instance (abort)License high water mark = 421/u01/app/oracle/admin/xgp2/bdump/xgp21_pmon_3250.trcOracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit ProductionWith the Partitioning, Real Application Clusters, OLAP, Data Miningand Real Application Testing optionsORACLE_HOME = /u01/app/oracle/product/10.2.0/db_1System name: HP-UXNode name: XGP2_db1Release: B.11.31Version: UMachine: ia64Instance name: xgp21Redo thread mounted by this instance: 1Oracle process number: 2Unix process pid: 3250, image: oracle@XGP2_db1 (PMON)*** SERVICE NAME:(SYS$BACKGROUND) 2010-12-01 01:57:55.933*** SESSION ID:(333.1) 2010-12-01 01:57:55.933*** 2010-12-01 01:57:55.933ksedmp: internal or fatal errorORA-00600: internal error code, arguments: [17175], [255], [], [], [], [], [], []ORA-00601: cleanup lock conflictksedst <- ksedmp <- ksfdmp <- kgeriv <- kgesiv<- kgesic1 <- kghcln <- kslilcr <- $cold_ksl_cleanup <- ksepop<- kgepop <- kgesev <- ksesec0 <- $cold_kslges <- ksl_get_child_latch<- kslgpl <- es <- ksfglt <- kghext_numa <- ksmasgn<- kghnospc <- $cold_kghalo <- ksmdacnk <- ksmdget <- ksosp_alloc<- ksoreq_submit <- ksbsrv <- kmmssv <- kmmlsa <- kmmlod<- ksucln <- ksbrdp <- opirip <- $cold_opidrv <- sou2o<- $cold_opimai_real <- main <- main_opd_entryPROCESS STATE-------------Process global information:process: c00000018d000078, call: c00000018d252238, xact: 0000000000000000, curses: c00000018d2508a8, usrses: c00000018d2508a8----------------------------------------SO: c00000018d000078, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00(process) Oracle pid=2, calls cur/top: c00000018d252238/c00000018d252238, flag: (e) SYSTEMint error: 0, call error: 0, sess error: 0, txn error 0(post info) last post received: 0 0 48last post received-location: ksoreq_replylast process to post me: c00000018d037978 1 64last post sent: 0 0 24last post sent-location: ksasndlast process posted by me: c00000018d001058 1 6(latch info) wait_event=0 bits=90holding (efd=5) c00000020001d500 Parent+children shared pool level=7Location from where latch is held: kghfrunp: alloc: clatch nowait:Context saved from call: 0state=busy, wlstate=freeholding (efd=5) c00000020000b5f8 OS process allocation level=4Location from where latch is held: ksoreq_submit:Context saved from call: 13835058076152957304state=busy, wlstate=freeProcess Group: DEFAULT, pseudo proc: c0000004dd263230O/S info: user: oracle, term: UNKNOWN, ospid: 3250OSD pid info: Unix process pid: 3250, image: oracle@XGP2_db1 (PMON)SO: c0000004df4d5f28, type: 19, owner: c00000018d000078, flag: INIT/-/-/0x00GES MSG BUFFERS: st=emp chunk=0x0000000000000000 hdr=0x0000000000000000 lnk=0x0000000000000000 flags=0x0 inc=4outq=0 sndq=0 opid=2 prmb=0x0mbg[i]=(2 19) mbg[b]=(0 0) mbg[r]=(0 0)fmq[i]=(4 1) fmq[b]=(0 0) fmq[r]=(0 0)mop[s]=20 mop[q]=1 pendq=0 zmbq=0nonksxp_recvs=0------------process 0xc0000004df4d5f28--------------------proc version : 0Local node : 0pid : 3250lkp_node : 0svr_mode : 0proc state : KJP_NORMALLast drm hb acked : 0Total accesses : 181Imm. accesses : 180Locks on ASTQ : 0Locks Pending AST : 0Granted locks : 0AST_Q:PENDING_Q:GRANTED_Q:----------------------------------------SO: c00000018d2f3610, type: 11, owner: c00000018d000078, flag: INIT/-/-/0x00(broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: c00000018d000078,event: 1, last message event: 1,last message waited event: 1, messages read: 0channel: (c0000004dd29fdb0) scumnt mount lockscope: 1, event: 19, last mesage event: 0,publishers/subscribers: 0/19,messages published: 0SO: c00000018d2508a8, type: 4, owner: c00000018d000078, flag: INIT/-/-/0x00(session) sid: 333 trans: 0000000000000000, creator: c00000018d000078, flag: (51) USR/- BSY/-/-/-/-/-DID: 0001-0002-00000003, short-term DID: 0000-0000-00000000txn branch: 0000000000000000oct: 0, prv: 0, sql: 0000000000000000, psql: 0000000000000000, user: 0/SYSservice name: SYS$BACKGROUNDlast wait for 'latch: shared pool' blocking sess=0x0000000000000000 seq=342 wait_time=175677 seconds since wait started=0address=c0000002000fff60, number=d6, tries=7Dumping Session Wait Historyfor 'latch: shared pool' count=1 wait_time=175677address=c0000002000fff60, number=d6, tries=7for 'latch: shared pool' count=1 wait_time=97554address=c0000002000fff60, number=d6, tries=6for 'latch: shared pool' count=1 wait_time=78023address=c0000002000fff60, number=d6, tries=5for 'latch: shared pool' count=1 wait_time=38978address=c0000002000fff60, number=d6, tries=4for 'latch: shared pool' count=1 wait_time=38942address=c0000002000fff60, number=d6, tries=3for 'latch: shared pool' count=1 wait_time=19435address=c0000002000fff60, number=d6, tries=2for 'latch: shared pool' count=1 wait_time=12655address=c0000002000fff60, number=d6, tries=1for 'latch: shared pool' count=1 wait_time=8address=c0000002000fff60, number=d6, tries=0for 'os thread startup' count=1 wait_time=144253=0, =0, =0for 'os thread startup' count=1 wait_time=141360=0, =0, =0SO: c00000018d2f3500, type: 11, owner: c00000018d000078, flag: INIT/-/-/0x00(broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: c00000018d000078,event: 2, last message event: 40,last message waited event: 40, messages read: 1channel: (c0000004dd29bbd8) system events broadcast channelscope: 2, event: 224634, last mesage event: 40,publishers/subscribers: 0/161,messages published: 1SO: c00000018d252238, type: 3, owner: c00000018d000078, flag: INIT/-/-/0x00(call) sess: cur c00000018d2508a8, rec 0, usr c00000018d2508a8; depth: 0----------------------------------------SO: c00000018d2594b0, type: 5, owner: c00000018d252238, flag: INIT/-/-/0x00(enqueue) PR-00000000-00000000 DID: 0001-0002-00000003lv: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res_flag: 0x2res: 0xc0000004df401718, mode: X, lock_flag: 0x0own: 0xc00000018d2508a8, sess: 0xc00000018d2508a8, proc: 0xc00000018d000078, prv: 0xc0000004df401728----------------------------------------SO: c00000018d30b710, type: 16, owner: c00000018d000078, flag: INIT/-/-/0x00(osp req holder)CHILD REQUESTS:(osp req) type=2(BACKGROUND) flags=0x20001(STATIC/-) state=1(INITED) err=0pg=0 arg1=0 arg2=(null) reply=(null) pname=S018pid=0 parent=c00000018d30b710 fulfill=0000000000000000----------------------------------------SO: c0000004dbff09c0, type: 192, owner: c0000004dbff09c0, flag: -/-/-/0x00在metalink上搜索600[17175]内部错误相关的文档,可以找到该错误的大量信息:
Keywords: ora-00600 [17175]1. Bug 6250251: ORA-00600 17175 DURING KGI CLEANUP - DUMP - ORADEBUG--ora-600 followed by ora-601 and instance crash with ORA-17175.--Also, setting of heap check event triggers this problem. In this case--it is event="10235 trace name context forever, level 27"2. Bug 4216668 - Dump from INSERT / MERGE on internal columns (Doc ID 4216668.8)--INSERT or MERGE commands might core dump if operating on object types and internal columns are involved.3. Bug 7590297: ORA-600 [17175] [255] ORA-601: CLEANUP LOCK CONFLICT CRASHED THE DATABASE4. SR 3-2296150050--The error has occurred when Oracle was cleaning shared pool latch/heap information about the processwhich died in middle.--There is no data corruption associated with this error.--This is evident from the function kghcln in the trace stack at which it failed.--This problem is usually the symptom of some earlier problem with the latch.--Either after a process has died, or a process has signaled an error while holding a shared pool latch,and the index to the shared pool latch is invalid.--There was a Bug 7590297 raised for this issue which could not be progressed due to unavailability of information.--From few earlier known issues - This can be due to PMON may sometimes signal ORA-601while trying to start up additional shared servers or dispatchers.--There the workaround suggested was to Start the instance with max # of shared servers.--Can you reproduce the problem?If the instance has been restated the issue may not persist as it is related to memory.--If the issue persists then we have to perform the following to monitoring the instance to investigate further:--1. Set the following event in parameter file:--event="10257 trace name context forever, level 10"--event="601 trace name SYSTEMSTATE level 10"--The first event will cause PMON to dump info about shared server startup.--The second event will cause PMON to do a system state dump when the 601 occurs.--2. You should also have the track of this in intervals and save the historical results from:--SQL> select e.total_waits, e.total_timeouts, e.time_waited from v$session_event e, v$session s, v$bgprocess b where b.name='PMON' and s.paddr=b.paddr and e.sid=s.sid and e.event='process startup';5. SR 3-2123025401--=== ODM Solution / Action Plan ===--Disabled NUMA for resolution6. SR 7314313.994Analysis:Bug 6250251 and bug 4216668 are not applicable to this case.Bug 7590297 is applicable to this case, as the call stack, error message are the same with this case.But this patch is suspended as requested info is not available.SR 3-2296150050: same error message, same DB version, similar call stack; closed without solution.SR 3-2123025401: same error message, same DB version, similar call stack.The issue happened twice in that SR and solved by disabling NUMASR 7314313.994: same error message, same DB version, similar call stack; closed without solution.ERROR:ORA-600 [17175] [a]VERSIONS:versions 9.2 to 10.1DESCRIPTION:This error occurs when we are cleaning up a shared pool latch (either after a process has died,or a process has signaled an error while holding a shared pool latch),and the index to the shared pool latch is invalid.ARGUMENTS:Arg [a] index of the latch recovery structure - usually 255FUNCTIONALITY:Generic Heap ManagerIMPACT:INSTANCE HANGPROCESS FAILUREINSTANCE FAILURE以下为Oracle GCS给出的行动计划,GCS认为绝大多数ORA-00600 [17xxx]是由memory相关的问题引起的,这些问题往往在重启实例后就可以得到解决。并建议可以设置shared_servers=max_shared_servers后进一步观察:
From the uploaded files it looks like you were reported with ORA-00600 [17175] errorsand crashed the instance.What is the current status after the restart of the database.Are you still reported with the same errors and crashing the instance ?Mostly the ORA-00600 [17xxx] errors are memory releated and might have got resolved after the database restart.Further looking at the uploaded trace file the failing functions and the error closelymatches Bug 6958493and is closed as duplicate of BaseBug 6962340which is closedas could not able to reproduce the error.Also a smillar issue is reported inBug 3104250which is fixed in 10g, but that doesn't meanyou cannot get this error for a new reason and that the same workaround would fix it.We need to implement the workaround and set: shared_servers=max_shared_serversif the error reproduces again. If this is still repeated issue then we can file a new bug with development for the same.ACTION PLAN===========1. Monitor the alertlog for the ORA-00600 [17175] errors for the next few days and if the database still crashes then pleaseset shared_servers=max_shared_servers and see if the problem resolves or not.