[Ocfs2-users] Database won't mount

Wed Feb 9 05:17:53 PST 2011

Hi,

We're running an Oracle cluster with Oracle cluster dataguard. For a testing reason and to change hostnames, we wanted to reinstall the three dataguard nodes. The old installation was/is:
`uname -a`
Linux dbnode1 2.6.18-36.el5 #1 SMP Fri Jul 20 14:26:46 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux

Installed with ocfs2:
rpm -qa|grep ocfs2
                ocfs2-tools-debuginfo-1.2.6-1.el5
ocfs2-tools-1.2.6-1.el5
ocfs2-2.6.18-8.1.8.el5-1.2.6-6.el5
ocfs2console-1.2.6-1.el5
ocfs2-tools-devel-1.2.6-1.el5

The reinstalled nodes was installed with:
                `uname -a`
                Linux dbnode2 2.6.18-238.1.1.el5 #1 SMP Tue Jan 4 13:32:19 EST 2011 x86_64 x86_64 x86_64 GNU/Linux

                rpm -qa|grep ocfs2
ocfs2-2.6.18-238.1.1.el5-1.4.7-1.el5
ocfs2console-1.4.4-1.el5
ocfs2-tools-1.4.4-1.el5

Oracle version is 10.2.0.3 on both.

The installation seems to be OK. I first reinstalled the two node who wasn't running the dataguard instance. When these two was installed, I tried to start the dataguard instance on node2. To get node1 ready for reinstallation. But when I run "startup mount", the database wouldn't mount. It was "hanging" until I run "shutdown abort" in another session. After a logn while, while "hanging", I finally got an ora-600 [2116] in my alertlog. As well as two tracefiles in the bdump catalog. One of the tracefiles was saying:
*****snipp*****
*** 2011-02-07 09:33:55.096
*** SERVICE NAME:() 2011-02-07 09:33:55.096
*** SESSION ID:(2195.1) 2011-02-07 09:33:55.096
Received ORADEBUG command 'dump errorstack 3' from process Unix process pid: 17868, image:
*** 2011-02-07 09:33:55.096
ksedmp: internal or fatal error
****snipp****

The other tracefile:
*** 2011-02-07 09:23:52.445
*** SERVICE NAME:() 2011-02-07 09:23:52.444
*** SESSION ID:(2185.1) 2011-02-07 09:23:52.444
Waited for detached process: CKPT for 300 seconds:
*** 2011-02-07 09:23:52.445
Dumping diagnostic information for CKPT:
OS pid = 17835
loadavg : 3.01 3.04 2.91
memory info: free memory = 0.00M
swap info:   free = 0.00M alloc = 0.00M total = 0.00M
F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD
0 S oracle   17835     1  0  75   0 - 1871417 io_get 09:18 ?      00:00:00 ora_ckpt_DBDG02
[Thread debugging using libthread_db enabled]
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fffc2de9000
0x00002b7f59d5f5b4 in ?? () from /usr/lib64/libaio.so.1

The alertlog sayings:
ALTER DATABASE   MOUNT
Mon Feb  7 09:18:52 2011
This instance was first to mount
Mon Feb  7 09:33:55 2011
Errors in file /disk00/admin/DBDG/bdump/DBDG02_ckpt_17835.trc:
Mon Feb  7 09:33:55 2011
Trace dumping is performing id=[cdmp_20110207093355]
Mon Feb  7 09:33:55 2011
Errors in file /disk00/admin/ DBDG/udump/ DBDG02_ora_17868.trc:
ORA-00600: internal error code, arguments: [2116], [900], [], [], [], [], [], []
Mon Feb  7 09:33:56 2011
Trace dumping is performing id=[cdmp_20110207093356]
Mon Feb  7 09:43:23 2011
Shutting down instance (abort)
License high water mark = 2
Termination issued to instance processes. Waiting for the processes to exit
Mon Feb  7 09:43:33 2011
Instance termination failed to kill one or more processes
Instance terminated by USER, pid = 25297

After some investigation, I thought there must be something wrong in the OS or OCFS2. So we downgraded the kernel to:
                Linux dbnode2 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

And installed the ocfs2 kernelversion. And then everything worked OK. So my question is if there is something wrong with ocfs2 kernel version ocfs2-2.6.18-238.1.1.el5-1.4.7-1.el5 or is it the OS that fails?

Mvh
Morten Kristiansen

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20110209/fbee344d/attachment-0001.html