[Ocfs2-users] Database won't mount

Sun Feb 13 23:35:51 PST 2011

As I said we're running Oracle 10.2.0.3 both on RAC software and DB Enterprice Software. And I don't think it's any problem running RAC on ocfs2. We have been running 7 different RAC, all on ocfs2 for several years with no problems. And after following this mailinglist for a year, I don't think it's that rear. And yes the database has problems with CKPT with the controlfile placed on /disk03. Where I think the oracle process are hanging and that's why I'm not able to unmount /disk03. The question is why this happens? In my opinion it has something to do with either OS or ocfs2. Running on a lower version of the kernel, is no such problem. So I think there must be some kind of bug somewhere.

Mvh
Morten Kristiansen

From: Michael Austin [mailto:onedbguru at gmail.com]
Sent: 11. februar 2011 19:51
To: Kristiansen Morten
Subject: Re: [Ocfs2-users] Database won't mount

It appears the ORA-600 gets generated when CKPT could not lock/access the control files.

First, what version of Oracle RAC? While OCFS2 "can" be used to as the shared storage between RAC does not mean it is a good idea - in fact it is STRONGLY suggested that you use ASM and ACFS.  I have seen MANY clusters (10g and 11g) running on ASM with NO problems with one at a previous employer at 380TB+ on Solaris, not Linux.  Using anything other than ASM seems to be very problematic.  If you don't want to use ASM, you can always use ACFS - while it lives within ASM, ASM is the volume manager.  With ACFS, you have "normal" mount points where you can store your data and access it just like a normal unix file system - however things like deletes etc are significantly faster.  They can also be dynamically resized on the fly/online.
In one environment in the past we used ASM for normal database storage and used ACFS volumes for the shared stuff like FRA archivelogs and rman backups.

On Wed, Feb 9, 2011 at 8:30 AM, Kristiansen Morten <Morten.Kristiansen at hn-ikt.no<mailto:Morten.Kristiansen at hn-ikt.no>> wrote:
Hi again,
Forgot to say that when I had tried to start the database, I was unable to umount /disk03 which is one of two disks where the controlfile for the database is installed. When I shutdown CRS, there was still one oracle process left and it was impossible to kill it. Had to reboot the server. In the `ps -fel` list, the process had value "wait_f" in the WCHAN column and it was owned by process 1.

Mvh
Morten Kristiansen

From: ocfs2-users-bounces at oss.oracle.com<mailto:ocfs2-users-bounces at oss.oracle.com> [mailto:ocfs2-users-bounces at oss.oracle.com<mailto:ocfs2-users-bounces at oss.oracle.com>] On Behalf Of Kristiansen Morten
Sent: 9. februar 2011 14:18
To: ocfs2-users at oss.oracle.com<mailto:ocfs2-users at oss.oracle.com>
Subject: [Ocfs2-users] Database won't mount

Hi,

We're running an Oracle cluster with Oracle cluster dataguard. For a testing reason and to change hostnames, we wanted to reinstall the three dataguard nodes. The old installation was/is:
`uname -a`
Linux dbnode1 2.6.18-36.el5 #1 SMP Fri Jul 20 14:26:46 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux

Installed with ocfs2:
rpm -qa|grep ocfs2
                ocfs2-tools-debuginfo-1.2.6-1.el5
ocfs2-tools-1.2.6-1.el5
ocfs2-2.6.18-8.1.8.el5-1.2.6-6.el5
ocfs2console-1.2.6-1.el5
ocfs2-tools-devel-1.2.6-1.el5

The reinstalled nodes was installed with:
                `uname -a`
                Linux dbnode2 2.6.18-238.1.1.el5 #1 SMP Tue Jan 4 13:32:19 EST 2011 x86_64 x86_64 x86_64 GNU/Linux

                rpm -qa|grep ocfs2
ocfs2-2.6.18-238.1.1.el5-1.4.7-1.el5
ocfs2console-1.4.4-1.el5
ocfs2-tools-1.4.4-1.el5

Oracle version is 10.2.0.3 on both.

The installation seems to be OK. I first reinstalled the two node who wasn't running the dataguard instance. When these two was installed, I tried to start the dataguard instance on node2. To get node1 ready for reinstallation. But when I run "startup mount", the database wouldn't mount. It was "hanging" until I run "shutdown abort" in another session. After a logn while, while "hanging", I finally got an ora-600 [2116] in my alertlog. As well as two tracefiles in the bdump catalog. One of the tracefiles was saying:
*****snipp*****
*** 2011-02-07 09:33:55.096
*** SERVICE NAME:() 2011-02-07 09:33:55.096
*** SESSION ID:(2195.1) 2011-02-07 09:33:55.096
Received ORADEBUG command 'dump errorstack 3' from process Unix process pid: 17868, image:
*** 2011-02-07 09:33:55.096
ksedmp: internal or fatal error
****snipp****

The other tracefile:
*** 2011-02-07 09:23:52.445
*** SERVICE NAME:() 2011-02-07 09:23:52.444
*** SESSION ID:(2185.1) 2011-02-07 09:23:52.444
Waited for detached process: CKPT for 300 seconds:
*** 2011-02-07 09:23:52.445
Dumping diagnostic information for CKPT:
OS pid = 17835
loadavg : 3.01 3.04 2.91
memory info: free memory = 0.00M
swap info:   free = 0.00M alloc = 0.00M total = 0.00M
F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD
0 S oracle   17835     1  0  75   0 - 1871417 io_get 09:18 ?      00:00:00 ora_ckpt_DBDG02
[Thread debugging using libthread_db enabled]
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fffc2de9000
0x00002b7f59d5f5b4 in ?? () from /usr/lib64/libaio.so.1

The alertlog sayings:
ALTER DATABASE   MOUNT
Mon Feb  7 09:18:52 2011
This instance was first to mount
Mon Feb  7 09:33:55 2011
Errors in file /disk00/admin/DBDG/bdump/DBDG02_ckpt_17835.trc:
Mon Feb  7 09:33:55 2011
Trace dumping is performing id=[cdmp_20110207093355]
Mon Feb  7 09:33:55 2011
Errors in file /disk00/admin/ DBDG/udump/ DBDG02_ora_17868.trc:
ORA-00600: internal error code, arguments: [2116], [900], [], [], [], [], [], []
Mon Feb  7 09:33:56 2011
Trace dumping is performing id=[cdmp_20110207093356]
Mon Feb  7 09:43:23 2011
Shutting down instance (abort)
License high water mark = 2
Termination issued to instance processes. Waiting for the processes to exit
Mon Feb  7 09:43:33 2011
Instance termination failed to kill one or more processes
Instance terminated by USER, pid = 25297

After some investigation, I thought there must be something wrong in the OS or OCFS2. So we downgraded the kernel to:
                Linux dbnode2 2.6.18-128.el5 #1 SMP Wed Dec 17 11:41:38 EST 2008 x86_64 x86_64 x86_64 GNU/Linux

And installed the ocfs2 kernelversion. And then everything worked OK. So my question is if there is something wrong with ocfs2 kernel version ocfs2-2.6.18-238.1.1.el5-1.4.7-1.el5 or is it the OS that fails?

Mvh
Morten Kristiansen

_______________________________________________
Ocfs2-users mailing list
Ocfs2-users at oss.oracle.com<mailto:Ocfs2-users at oss.oracle.com>
http://oss.oracle.com/mailman/listinfo/ocfs2-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20110214/39a648c2/attachment-0001.html