[Ocfs-users] Cluster Manager Issue on OCFS Firewire ?
Chris Robertson
Chris.Robertson at instill.com
Wed Jul 14 11:22:26 CDT 2004
We saw a lot of similar firewire related errors on our test setup as well. It turned out to be a combination of cabling and host port problems. After trying different cable/port combinations all of our problems went away.
HTH
Chris
-----Original Message-----
From: ocfs-users-bounces at oss.oracle.com
[mailto:ocfs-users-bounces at oss.oracle.com]On Behalf Of Darren Scott
Sent: Wednesday, July 14, 2004 8:31 AM
To: ocfs-users at oss.oracle.com
Subject: [Ocfs-users] Cluster Manager Issue on OCFS Firewire ?
I have set up an ocfs system using two linux nodes connected using
firewire.
Details:
# uname -a
Linux testrac2 2.4.21-9.0.1.ELorafw1 #1 Tue Mar 2 14:42:46 PST 2004 i686
i686 i386 GNU/Linux
# cat /etc/issue
Red Hat Enterprise Linux ES release 3 (Taroon Update 1)
# rpm -qa |grep ocfs
ocfs-tools-1.1.2-1
ocfs-2.4.21-EL-1.0.12-1
ocfs-support-1.1.2-1
<>
Oracle version: 9.2.0.5
Cluster manager version: 9.2.0.4.0.48
Everything appears to be fine the cluster manager can be started on both
nodes and remains running (no crashes have been seen) and the database
starts correctly and performs as a RAC database.<> However I have
noticed the following messages appearing at random in the system message
files.
On Node1
# tail -f /var/log/messages
Jul 14 08:58:31 testrac1 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 08:58:31 testrac1 kernel: Read (10) 00 00 00 00 2a 00 00 01 00
Jul 14 09:18:15 testrac1 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 09:18:15 testrac1 kernel: Read (10) 00 00 00 00 2a 00 00 01 00
Jul 14 10:58:04 testrac1 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 10:58:04 testrac1 kernel: Read (10) 00 00 00 00 2a 00 00 01 00
Jul 14 11:57:20 testrac1 kernel: ocfs: Removing testrac2 (node 1) from
clustered device (8,0)
Jul 14 11:57:40 testrac1 kernel: ocfs: Adding testrac2 (node 1) to
clustered device (8,0)
Jul 14 12:36:58 testrac1 kernel: ocfs: Removing testrac2 (node 1) from
clustered device (8,0)
Jul 14 12:37:17 testrac1 kernel: ocfs: Adding testrac2 (node 1) to
clustered device (8,0)
Jul 14 13:36:25 testrac1 kernel: ocfs: Removing testrac2 (node 1) from
clustered device (8,0)
Jul 14 13:36:45 testrac1 kernel: ocfs: Adding testrac2 (node 1) to
clustered device (8,0)
Jul 14 14:35:24 testrac1 kernel: ocfs: Removing testrac2 (node 1) from
clustered device (8,0)
Jul 14 14:35:43 testrac1 kernel: ocfs: Adding testrac2 (node 1) to
clustered device (8,0)
On Node2
# tail -f /var/log/messages
Jul 14 08:58:13 testrac2 kernel: ocfs: Removing testrac1 (node 0) from
clustered device (8,0)
Jul 14 08:58:32 testrac2 kernel: ocfs: Adding testrac1 (node 0) to
clustered device (8,0)
Jul 14 09:17:56 testrac2 kernel: ocfs: Removing testrac1 (node 0) from
clustered device (8,0)
Jul 14 09:18:16 testrac2 kernel: ocfs: Adding testrac1 (node 0) to
clustered device (8,0)
Jul 14 10:57:46 testrac2 kernel: ocfs: Removing testrac1 (node 0) from
clustered device (8,0)
Jul 14 10:58:05 testrac2 kernel: ocfs: Adding testrac1 (node 0) to
clustered device (8,0)
Jul 14 11:57:39 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 11:57:39 testrac2 kernel: Read (10) 00 00 00 00 2a 00 00 01 00
Jul 14 12:37:17 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 12:37:17 testrac2 kernel: Read (10) 00 00 00 00 2a 00 00 01 00
Jul 14 13:36:44 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 13:36:44 testrac2 kernel: Read (10) 00 00 00 00 2a 00 00 01 00
Jul 14 13:56:26 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 13:56:26 testrac2 kernel: Write (10) 00 00 00 7b 80 00 00 08 00
Jul 14 14:35:43 testrac2 kernel: ieee1394: sbp2: aborting sbp2 command
Jul 14 14:35:43 testrac2 kernel: Read (10) 00 00 00 00 2a 00 00 01 00
These messages seem to imply that the cluster manager is constantly
reconfiguring itself. I was previously on 9.2.0.4 (CM 9.2.0.2.0.47)
and have since upgraded but this did not resolve the situation. If the
database attempts to write to disk during the brief time that either of
these nodes has been removed from the cluster then errors are written to
the alert log and the database hangs/falls over.
Strangely I have another test system set up using firewire which is
almost identical except for the fact that it utilises raw disks and NOT
ocfs and there have been no issues regarding the cluster manager.
Whilst I realise that this is a firewire install and thus is not
supported by Oracle, I was wondering if anyone else seen this type of
behaviour?
Your comments would be most appreciated.
________________________________________________________________________
E-mail is an informal method of communication and may be subject to data corruption, interception and unauthorised amendment for which Digital Bridges Ltd will accept no liability. Therefore, it will normally be inappropriate to rely on information contained on e-mail without obtaining written confirmation.
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
________________________________________________________________________
_______________________________________________
Ocfs-users mailing list
Ocfs-users at oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs-users
More information about the Ocfs-users
mailing list