<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;"></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">==> /var/log/messages from the node being rebooted doesn’t show anything, just the to reboot shows the following
That's how Linux works. You should configure netconsole or netdump
to capture the logs. Only then we'll know as to why the node is panic-ing.
</span></font></pre>
<br>
On 12/07/2010 03:27 PM, Neil Campbell wrote:
<blockquote
cite="mid:33F90FC53D0A7848BFC5B88CBF7C4F9C44ADCA@dc-exc03.deg.aus"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
<meta name="Generator" content="Microsoft Word 11 (filtered
medium)">
<o:smarttagtype
namespaceuri="urn:schemas-microsoft-com:office:smarttags"
name="PersonName">
<!--[if !mso]>
<style>
st1\:*{behavior:url(#default#ieooui) }
</style>
<![endif]-->
<style>
<!--
/* Font Definitions */
@font-face
        {font-family:"Century Gothic";
        panose-1:2 11 5 2 2 2 2 2 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman";}
a:link, span.MsoHyperlink
        {color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {color:purple;
        text-decoration:underline;}
pre
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:10.0pt;
        font-family:"Courier New";}
span.EmailStyle17
        {mso-style-type:personal-compose;
        font-family:"Century Gothic";
        color:black;
        font-weight:normal;
        font-style:normal;
        text-decoration:none none;}
@page Section1
        {size:595.3pt 841.9pt;
        margin:70.9pt 2.0cm 70.9pt 2.0cm;}
div.Section1
        {page:Section1;}
-->
</style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1029" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="Section1">
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">Hi all,<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;"><o:p> </o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">I keep getting node reboots across my cluster, it seems random in that the node being evicted changes<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">and in that it happens every now an then. I’m running RHEL 4 kernel 2.6.89.0.26.ELsmp, <o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">and OCFS is OCFS2 1.2.9 Mon Jun 21 20:03:07 PDT 2010 (build 5e8325ec7f66b5189c65c7a8710fe8cb)<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;"><o:p> </o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">I am using OCFS2 as a general purpose filesystem (i.e not for <st1:personname w:st="on">Oracle</st1:personname> datafiles or OCR etc),<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">with the following entries in /etc/fstab <o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;"><o:p> </o:p></span></font></pre>
<pre style="margin-left: 36pt;"><font size="2" face="Courier New"><span style="font-size: 10pt;">/dev/emcpowera1 /u01/cfs ocfs2 _netdev 0 0<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;"><o:p> </o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">As a general purpose filesystem, should I be using the nointr mount option?<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;"><o:p> </o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">/etc/init.d/o2cb status<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;"><o:p> </o:p></span></font></pre>
<pre style="margin-left: 36pt;"><font size="2" face="Courier New"><span style="font-size: 10pt;">Module "configfs": Loaded<o:p></o:p></span></font></pre>
<pre style="margin-left: 36pt;"><font size="2" face="Courier New"><span style="font-size: 10pt;">Filesystem "configfs": Mounted<o:p></o:p></span></font></pre>
<pre style="margin-left: 36pt;"><font size="2" face="Courier New"><span style="font-size: 10pt;">Module "ocfs2_nodemanager": Loaded<o:p></o:p></span></font></pre>
<pre style="margin-left: 36pt;"><font size="2" face="Courier New"><span style="font-size: 10pt;">Module "ocfs2_dlm": Loaded<o:p></o:p></span></font></pre>
<pre style="margin-left: 36pt;"><font size="2" face="Courier New"><span style="font-size: 10pt;">Module "ocfs2_dlmfs": Loaded<o:p></o:p></span></font></pre>
<pre style="margin-left: 36pt;"><font size="2" face="Courier New"><span style="font-size: 10pt;">Filesystem "ocfs2_dlmfs": Mounted<o:p></o:p></span></font></pre>
<pre style="margin-left: 36pt;"><font size="2" face="Courier New"><span style="font-size: 10pt;">Checking O2CB cluster UATocfs2: Online<o:p></o:p></span></font></pre>
<pre style="margin-left: 36pt;"><font size="2" face="Courier New"><span style="font-size: 10pt;"> Heartbeat dead threshold: 61<o:p></o:p></span></font></pre>
<pre style="margin-left: 36pt;"><font size="2" face="Courier New"><span style="font-size: 10pt;"> Network idle timeout: 60000<o:p></o:p></span></font></pre>
<pre style="margin-left: 36pt;"><font size="2" face="Courier New"><span style="font-size: 10pt;"> Network keepalive delay: 2000<o:p></o:p></span></font></pre>
<pre style="margin-left: 36pt;"><font size="2" face="Courier New"><span style="font-size: 10pt;"> Network reconnect delay: 2000<o:p></o:p></span></font></pre>
<pre style="margin-left: 36pt;"><font size="2" face="Courier New"><span style="font-size: 10pt;">Checking O2CB heartbeat: Active<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;"><o:p> </o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">/var/log/messages from the node being rebooted doesn’t show anything, just the to reboot shows the following<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;"><o:p> </o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;"> Dec 8 00:59:02 dcapp01 syslogd 1.4.1: restart.<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;"><o:p> </o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">On the other nodes, I see the following entries<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;"><o:p> </o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">Dec 8 01:56:01 dcapp02 kernel: o2net: connection to node dcapp01 (num 0) at 10.255.255.1:10007 has been idle for 60.0 seconds, shutting it down.<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">Dec 8 01:56:01 dcapp02 kernel: (0,3):o2net_idle_timer:1426 here are some times that might help debug the situation: (tmr 1291733701.691575 now 1291733761.692608 dr 1291733701.690949 adv 1291733701.696965:1291733701.696967 func (d399da91:500) 1291733701.691576:1291733701.696950)<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">Dec 8 01:56:01 dcapp02 kernel: o2net: no longer connected to node dcapp01 (num 0) at 10.255.255.1:10007<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">Dec 8 01:57:01 dcapp02 kernel: (16082,3):o2net_connect_expired:1585 ERROR: no connection established with node 0 after 60.0 seconds, giving up and returning errors.<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">Dec 8 01:57:01 dcapp02 kernel: (4215,2):dlm_send_remote_convert_request:398 ERROR: status = -107<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">Dec 8 01:57:01 dcapp02 kernel: (4215,2):dlm_wait_for_node_death:365 C5C06C9B675D41B99B60DE2EB28CE0F7: waiting 5000ms for notification of death of node 0<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">Dec 8 01:57:04 dcapp02 kernel: (16082,3):ocfs2_dlm_eviction_cb:119 device (120,1): dlm has evicted node 0<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">Dec 8 01:57:05 dcapp02 kernel: (4269,0):dlm_send_remote_convert_request:398 ERROR: status = -107<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">Dec 8 01:57:05 dcapp02 kernel: (4269,0):dlm_wait_for_node_death:365 D43AF814A25845F7B103EBBEA440BA18: waiting 5000ms for notification of death of node 0<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">Dec 8 01:57:05 dcapp02 kernel: (16082,3):ocfs2_dlm_eviction_cb:119 device (120,66): dlm has evicted node 0<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">Dec 8 01:57:06 dcapp02 kernel: (16082,3):ocfs2_dlm_eviction_cb:119 device (120,65): dlm has evicted node 0<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">Dec 8 02:00:15 dcapp02 kernel: o2net: connected to node dcapp01 (num 0) at 10.255.255.1:10007<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">Dec 8 02:00:29 dcapp02 kernel: ocfs2_dlm: Node 0 joins domain C5C06C9B675D41B99B60DE2EB28CE0F7<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">Dec 8 02:00:29 dcapp02 kernel: ocfs2_dlm: Nodes in domain ("C5C06C9B675D41B99B60DE2EB28CE0F7"): 0 1 2 3 6 7 8 9 10 11 14 15<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">Dec 8 02:00:35 dcapp02 kernel: ocfs2_dlm: Node 0 joins domain 97F22666B5A6494AAF38C53909275DB2<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">Dec 8 02:00:35 dcapp02 kernel: ocfs2_dlm: Nodes in domain ("97F22666B5A6494AAF38C53909275DB2"): 0 1 2 3<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">Dec 8 02:00:39 dcapp02 kernel: ocfs2_dlm: Node 0 joins domain D43AF814A25845F7B103EBBEA440BA18<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">Dec 8 02:00:39 dcapp02 kernel: ocfs2_dlm: Nodes in domain ("D43AF814A25845F7B103EBBEA440BA18"): 0 1 2 3<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;"><o:p> </o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">I would really appreciate some help with this, as I’m not sure where to go from here.<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;"><o:p> </o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">Thanks<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;">Neil<o:p></o:p></span></font></pre>
<pre><font size="2" face="Courier New"><span style="font-size: 10pt;"><o:p> </o:p></span></font></pre>
</div>
<p><font size="2" face="Arial">
<hr>
</font><font size="2" face="Arial">Downer<br>
This message is for the named person's use only. It may
contain confidential, proprietary or legally privileged
information. No confidentiality or privilege is waived or
lost by any mistransmission. If you receive this message in
error, please immediately delete it and all copies of it
from your system, destroy any hard copies of it and notify
the sender. You must not, directly or indirectly, use,
disclose, distribute, print, or copy any part of this
message if you are not the intended recipient. Downer EDI
and any of its subsidiaries each reserve the right to
monitor all e-mail communications through its networks. Any
views expressed in this message are those of the individual
sender, except where the message states otherwise and the
sender is authorized to state them to be the views of any
such entity.</font><font size="2" face="Arial">
<hr>
<br>
</font>
</p>
<pre wrap="">
<fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
Ocfs2-users mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Ocfs2-users@oss.oracle.com">Ocfs2-users@oss.oracle.com</a>
<a class="moz-txt-link-freetext" href="http://oss.oracle.com/mailman/listinfo/ocfs2-users">http://oss.oracle.com/mailman/listinfo/ocfs2-users</a></pre>
</o:smarttagtype></blockquote>
<br>
</body>
</html>