[Ocfs-users] Cluster Manager Issue on OCFS Firewire ?

Darren Scott darren.scott at digitalbridges.com
Wed Jul 14 17:31:08 CDT 2004


I have set up an ocfs system using two linux nodes connected using 
firewire.

Details:

# uname -a
Linux testrac2 2.4.21-9.0.1.ELorafw1 #1 Tue Mar 2 14:42:46 PST 2004 i686 
i686 i386 GNU/Linux

# cat /etc/issue
Red Hat Enterprise Linux ES release 3 (Taroon Update 1)

# rpm -qa |grep ocfs
ocfs-tools-1.1.2-1
ocfs-2.4.21-EL-1.0.12-1
ocfs-support-1.1.2-1
<>

Oracle version: 9.2.0.5
Cluster manager version:  9.2.0.4.0.48

Everything appears to be fine the cluster manager can be started on both 
nodes and remains running (no crashes have been seen) and the database 
starts correctly and performs as a RAC database.<>  However I have 
noticed the following messages appearing at random in the system message 
files.

On Node1

# tail -f /var/log/messages
Jul 14 08:58:31 testrac1 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 08:58:31 testrac1 kernel: Read (10) 00 00 00 00 2a 00 00 01 00
Jul 14 09:18:15 testrac1 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 09:18:15 testrac1 kernel: Read (10) 00 00 00 00 2a 00 00 01 00
Jul 14 10:58:04 testrac1 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 10:58:04 testrac1 kernel: Read (10) 00 00 00 00 2a 00 00 01 00
Jul 14 11:57:20 testrac1 kernel: ocfs: Removing testrac2 (node 1) from 
clustered device (8,0)
Jul 14 11:57:40 testrac1 kernel: ocfs: Adding testrac2 (node 1) to 
clustered device (8,0)
Jul 14 12:36:58 testrac1 kernel: ocfs: Removing testrac2 (node 1) from 
clustered device (8,0)
Jul 14 12:37:17 testrac1 kernel: ocfs: Adding testrac2 (node 1) to 
clustered device (8,0)
Jul 14 13:36:25 testrac1 kernel: ocfs: Removing testrac2 (node 1) from 
clustered device (8,0)
Jul 14 13:36:45 testrac1 kernel: ocfs: Adding testrac2 (node 1) to 
clustered device (8,0)
Jul 14 14:35:24 testrac1 kernel: ocfs: Removing testrac2 (node 1) from 
clustered device (8,0)
Jul 14 14:35:43 testrac1 kernel: ocfs: Adding testrac2 (node 1) to 
clustered device (8,0)


On Node2

# tail -f /var/log/messages
Jul 14 08:58:13 testrac2 kernel: ocfs: Removing testrac1 (node 0) from 
clustered device (8,0)
Jul 14 08:58:32 testrac2 kernel: ocfs: Adding testrac1 (node 0) to 
clustered device (8,0)
Jul 14 09:17:56 testrac2 kernel: ocfs: Removing testrac1 (node 0) from 
clustered device (8,0)
Jul 14 09:18:16 testrac2 kernel: ocfs: Adding testrac1 (node 0) to 
clustered device (8,0)
Jul 14 10:57:46 testrac2 kernel: ocfs: Removing testrac1 (node 0) from 
clustered device (8,0)
Jul 14 10:58:05 testrac2 kernel: ocfs: Adding testrac1 (node 0) to 
clustered device (8,0)
Jul 14 11:57:39 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 11:57:39 testrac2 kernel: Read (10) 00 00 00 00 2a 00 00 01 00
Jul 14 12:37:17 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 12:37:17 testrac2 kernel: Read (10) 00 00 00 00 2a 00 00 01 00
Jul 14 13:36:44 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 13:36:44 testrac2 kernel: Read (10) 00 00 00 00 2a 00 00 01 00
Jul 14 13:56:26 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 13:56:26 testrac2 kernel: Write (10) 00 00 00 7b 80 00 00 08 00
Jul 14 14:35:43 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 14:35:43 testrac2 kernel: Read (10) 00 00 00 00 2a 00 00 01 00

These messages seem to imply that the cluster manager is constantly 
reconfiguring itself.  I was previously on 9.2.0.4 (CM  9.2.0.2.0.47) 
and have since upgraded but this did not resolve the situation.  If the 
database attempts to write to disk during the brief time that either of 
these nodes has been removed from the cluster then errors are written to 
the alert log and the database hangs/falls over.

Strangely I have another test system set up using firewire which is 
almost identical except for the fact that it utilises raw disks and NOT 
ocfs and there have been no issues regarding the cluster manager.  
Whilst I realise that this is a firewire install and thus is not 
supported by Oracle, I was wondering if anyone else seen this type of 
behaviour?

Your comments would be most appreciated.


________________________________________________________________________

E-mail is an informal method of communication and may be subject to data corruption, interception and unauthorised amendment for which Digital Bridges Ltd will accept no liability. Therefore, it will normally be inappropriate to rely on information contained on e-mail without obtaining written confirmation.

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

________________________________________________________________________



More information about the Ocfs-users mailing list