<br><font size=2 face="sans-serif">Hi Sunil,</font>

<br>

<br><font size=2 face="sans-serif">my lotus notes choked on the table from

excel... So the two nodes have the following nodenumbers:</font>

<br><font size=2 face="sans-serif">Node &nbsp; &nbsp; &nbsp; &nbsp;ocfs2

&nbsp; &nbsp; &nbsp; &nbsp;crs/css</font>

<br><font size=2 face="sans-serif">byaz05 &nbsp; &nbsp; &nbsp; &nbsp;0

&nbsp; &nbsp; &nbsp; &nbsp;2</font>

<br><font size=2 face="sans-serif">byaz10 &nbsp; &nbsp; &nbsp; &nbsp;1

&nbsp; &nbsp; &nbsp; &nbsp;1</font>

<br>

<br><font size=2 face="sans-serif">Greets,</font>

<br><font size=2 face="sans-serif">Alex</font>

<br>

<br>

<br><tt><font size=3>&gt;In such a situation, ocfs2 fences the higher node

number. afaik,<br>

&gt;css does the same. What are the css node numbers for the two nodes?<br>

<br>

</font></tt><a href="http://oss.oracle.com/mailman/listinfo/ocfs2-users"><tt><font size=3 color=blue><u>&gt;alexandra.strauss

at bayerbbs.com</u></font></tt></a><tt><font size=3> wrote:<br>

&gt;&gt;<i><br>

</i>&gt;&gt;<i> Hello,<br>

</i>&gt;&gt;<i><br>

</i>&gt;&gt;<i> I refer to you hoping you may help me with my problem...

We have got <br>

</i>&gt;&gt;<i> an issur here and opened a SR at Metalink but until now,

we got no <br>

</i>&gt;&gt;<i> useful information in solving our problem. SR-Number is

6855815.994...<br>

</i>&gt;&gt;<i><br>

</i>&gt;&gt;<i> We wanted to protect 9i Single-Instance Databases with

10g Clusterware <br>

</i>&gt;&gt;<i> following the third-party-tool approach. There are no RAC-databases

<br>

</i>&gt;&gt;<i> involved. But we want to achieve high availability as the

databases <br>

</i>&gt;&gt;<i> are business critical systems. We want to make the systems

able to<br>

</i>&gt;&gt;<i> relocate to another machine in case of failure to keep

downtimes <br>

</i>&gt;&gt;<i> low... To achieve this we want to use OCFS2 for the filesystem.

<br>

</i>&gt;&gt;<i> Relocate is done by script with help of CRS.<br>

</i>&gt;&gt;<i><br>

</i>&gt;&gt;<i> So we took two systems (byaz05 and byaz10) and installed

the following <br>

</i>&gt;&gt;<i> software: 10g CRS (10.2.0.3) and Oracle Software 9.2.0.8

and OCFS2 1.2.8<br>

</i>&gt;&gt;<i><br>

</i>&gt;&gt;<i> We found the following Metalinknotes and adjusted the heartbeat

and <br>

</i>&gt;&gt;<i> timeouts for OCFS2: Metalink Note 395878.1: Heartbeat/Voting/Quorum

<br>

</i>&gt;&gt;<i> Related Timeout Configuration for Linux, OCFS2, RAC Stack

to avoid <br>

</i>&gt;&gt;<i> unnessary node fencing, panic and reboot<br>

</i>&gt;&gt;<i> Metalink Note 391771.1: OCFS2 - FREQUENTLY ASKED QUESTIONS

(hier <br>

</i>&gt;&gt;<i> insbesondere der Abschnitt zu Fencing und Quorum)<br>

</i>&gt;&gt;<i> Metalink Note 434255.1: Common reasons for OCFS2 Kernel

Panic or <br>

</i>&gt;&gt;<i> Reboot Issues<br>

</i>&gt;&gt;<i> Metalink Note 457423.1: OCFS2 Fencing, Network, and Disk

Heartbeat <br>

</i>&gt;&gt;<i> Timeout Configuration<br>

</i>&gt;&gt;<i><br>

</i>&gt;&gt;<i> We did no changes to the CRS/CSS default settings until

now.<br>

</i>&gt;&gt;<i><br>

</i>&gt;&gt;<i> During HA-testing we watched unexpected behaviour of the

system. We <br>

</i>&gt;&gt;<i> deactivated the bond for private interconnect and expected

only one <br>

</i>&gt;&gt;<i> node to go down. But we faced both nodes going down. As

it seems to me <br>

</i>&gt;&gt;<i> one node was rebooted from OCFS2 and the other one from

CRS/CSS.<br>

</i>&gt;&gt;<i><br>

</i>&gt;&gt;<i> Timestamp<br>

</i>&gt;&gt;<i> --------------------------------------------------------------------------------------------------------------

<br>

</i>&gt;&gt;<i><br>

</i>&gt;&gt;<i> 10:21:06 bond1 disabled (eth1)<br>

</i>&gt;&gt;<i> */var/log/messages byaz05*<br>

</i>&gt;&gt;<i> Apr 25 10:21:06 byaz05 kernel: bonding: bond1: link status

definitely <br>

</i>&gt;&gt;<i> down for interface eth1, disabling it<br>

</i>&gt;&gt;<i> Apr 25 10:21:06 byaz05 kernel: bonding: bond1: making interface

eth5 <br>

</i>&gt;&gt;<i> the new active one.<br>

</i>&gt;&gt;<i><br>

</i>&gt;&gt;<i> 10:21:09 bond1 disabled (eth5)<br>

</i>&gt;&gt;<i> */var/log/messages byaz05*<br>

</i>&gt;&gt;<i> Apr 25 10:21:09 byaz05 kernel: bonding: bond1: link status

definitely <br>

</i>&gt;&gt;<i> down for interface eth5, disabling it<br>

</i>&gt;&gt;<i> Apr 25 10:21:09 byaz05 kernel: bonding: bond1: now running

without any <br>

</i>&gt;&gt;<i> active interface !<br>

</i>&gt;&gt;<i><br>

</i>&gt;&gt;<i> 10:21:23 o2net &#8211; no longer connected<br>

</i>&gt;&gt;<i> */var/log/messages byaz05*<br>

</i>&gt;&gt;<i> Apr 25 10:21:23 byaz05 kernel: o2net: no longer connected

to node <br>

</i>&gt;&gt;<i> byaz10.bayer-ag.com (num 1) at 10.190.59.6:7777<br>

</i>&gt;&gt;<i> */var/log/messages byaz10*<br>

</i>&gt;&gt;<i> Apr 25 10:21:23 byaz10 kernel: o2net: no longer connected

to node <br>

</i>&gt;&gt;<i> byaz05.bayer-ag.com (num 0) at 10.190.59.5:7777<br>

</i>&gt;&gt;<i><br>

</i>&gt;&gt;<i> 10:21:27 CSSD failure 134<br>

</i>&gt;&gt;<i> 10:21:29 Reboot initiated by CRS<br>

</i>&gt;&gt;<i> */var/log/messages byaz05*<br>

</i>&gt;&gt;<i> Apr 25 10:21:27 byaz05 logger: Oracle clsomon failed with

fatal status <br>

</i>&gt;&gt;<i> 12.<br>

</i>&gt;&gt;<i> Apr 25 10:21:27 byaz05 logger: Oracle CSSD failure 134.<br>

</i>&gt;&gt;<i> Apr 25 10:21:27 byaz05 su(pam_unix)[25839]: session closed

for user <br>

</i>&gt;&gt;<i> oracle<br>

</i>&gt;&gt;<i> Apr 25 10:21:27 byaz05 logger: Oracle CRS failure. Rebooting

for <br>

</i>&gt;&gt;<i> cluster integrity.<br>

</i>&gt;&gt;<i> Apr 25 10:21:27 byaz05 kernel: md: stopping all md devices.<br>

</i>&gt;&gt;<i> Apr 25 10:21:27 byaz05 kernel: md: md0 switched to read-only

mode.<br>

</i>&gt;&gt;<i> Apr 25 10:21:29 byaz05 logger: Oracle CRS failure. Rebooting

for <br>

</i>&gt;&gt;<i> cluster integrity.<br>

</i>&gt;&gt;<i> Apr 25 10:21:29 byaz05 kernel: e1000: eth2: e1000_watchdog_task:

NIC <br>

</i>&gt;&gt;<i> Link is Up 1000 Mbps Full Duplex<br>

</i>&gt;&gt;<i> Apr 25 10:21:29 byaz05 logger: Oracle init script ceding

reboot to <br>

</i>&gt;&gt;<i> sibling 27383.<br>

</i>&gt;<i>&gt;<br>

</i>&gt;<i>&gt; 10:21:58 Reboot initiated by OCFS2(?)<br>

</i>&gt;<i>&gt; */var/log/messages byaz10*<br>

</i>&gt;<i>&gt; Apr 25 10:21:58 byaz10 su(pam_unix)[4595]: session opened

for user <br>

</i>&gt;<i>&gt; oracle by (uid=0)<br>

</i>&gt;<i>&gt; Apr 25 10:21:58 byaz10 su(pam_unix)[4595]: session closed

for user oracle<br>

</i>&gt;<i>&gt; Apr 25 10:25:58 byaz10 syslogd 1.4.1: restart.<br>

</i>&gt;<i>&gt; Apr 25 10:25:58 byaz10 syslog: syslogd startup succeeded<br>

</i>&gt;<i>&gt; Apr 25 10:25:58 byaz10 kernel: klogd 1.4.1, log source

= /proc/kmsg <br>

</i>&gt;<i>&gt; started.<br>

</i>&gt;<i>&gt; Apr 25 10:25:58 byaz10 kernel: Bootdata ok (command line

is ro <br>

</i>&gt;<i>&gt; root=/dev/vgroot/_)<br>

</i>&gt;<i>&gt;<br>

</i>&gt;<i>&gt;<br>

</i>&gt;<i>&gt; We supposed all the time this is a timing problem. But

we don't know <br>

</i>&gt;&gt;<i> which settings raise the problem and which steps to do

to to correct <br>

</i>&gt;<i>&gt; them. Otherwise we'll have to work over the complete concept

for the <br>

</i>&gt;&gt;<i> business critical systems.<br>

</i>&gt;&gt;<i> Can anyone help me?<br>

</i>&gt;&gt;<i><br>

</i><br>

&gt;&gt;<i> Regards,<br>

</i>&gt;&gt;<i> Alexandra<br>

</i></font></tt>