<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
ocfs2 uses disk heartbeat to detect node liveness. It uses net
heartbeat<br>
to detect link liveness. Both need to operate for the cluster to
function.<br>
If the network link between two nodes snaps, then one of the two
nodes<br>
is fenced.<br>
<br>
The stack below indicates that the two nodes are not able to
communicate.<br>
The two nodes are waiting on the quorum to fence one of the nodes.<br>
It appears you have upped the disk heartbeat timeout > 2mins. I
would imagine<br>
one of the nodes reset after that timeout.<br>
<br>
On 09/10/2011 08:54 PM, Hai Tao wrote:
<blockquote cite="mid:BAY156-W639393F983CE859CD6974CEB030@phx.gbl"
type="cite">
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 10pt;
font-family:Tahoma
}
--></style>
<div dir="ltr">
is ocfs2 heartbeat transferred over the network, or just
updating a file to the shared disk?<br>
<br>
If the heartbeat lost, what should happen? what if only one node
is writing, and the other is still? Will it still cause any file
system issue?<br>
<br>
<br>
<div>Thanks.</div>
<div> </div>
<div>Hai Tao</div>
<br>
<br>
<div>
<hr id="stopSpelling">
From: <a class="moz-txt-link-abbreviated" href="mailto:taoh666@hotmail.com">taoh666@hotmail.com</a><br>
To: <a class="moz-txt-link-abbreviated" href="mailto:ocfs2-users@oss.oracle.com">ocfs2-users@oss.oracle.com</a><br>
Date: Sat, 10 Sep 2011 00:50:23 -0700<br>
Subject: [Ocfs2-users] disable heartbeat nic caused ocfs2
errors<br>
<br>
<meta name="Generator" content="Microsoft SafeHTML">
<style>
.ExternalClass .ecxhmmessage P
{padding:0px;}
.ExternalClass body.ecxhmmessage
{font-size:10pt;font-family:Tahoma;}
</style>
<div dir="ltr">I have a two nodes ocfs2 cluster, and I
disabled the heartbeat nic with "ifdown eth1". I got
following weird logs on both nodes:<br>
<br>
Sep 7 10:45:49 dbtest-01 kernel: o2net: connection to node
dbtest-02 (num 1) at 10.194.59.65:7777 has been idle for
30.0 seconds, shutting it down.<br>
Sep 7 10:45:49 dbtest-01 kernel:
(swapper,0,3):o2net_idle_timer:1503 here are some times that
might help debug the situation: (tmr 1315417519.185025 now
1315417549.183798 dr 1315417519.185016 adv
1315417519.185032:1315417519.185032 func (b9bb7168:504)
1315417518.872227:1315417518.872268)<br>
Sep 7 10:45:49 dbtest-01 kernel: o2net: no longer connected
to node dbtest-02 (num 1) at 10.194.59.65:7777<br>
Sep 7 10:45:49 dbtest-01 kernel:
(dlm_thread,3781,2):dlm_send_proxy_ast_msg:457 ERROR: status
= -112<br>
Sep 7 10:45:49 dbtest-01 kernel:
(oracle,26129,1):dlm_do_master_request:1334 ERROR: link to 1
went down!<br>
Sep 7 10:45:49 dbtest-01 kernel:
(oracle,26129,1):dlm_get_lock_<a class="moz-txt-link-freetext" href="resource:917">resource:917</a> ERROR: status =
-112<br>
Sep 7 10:45:49 dbtest-01 kernel:
(dlm_thread,4256,1):dlm_send_proxy_ast_msg:457 ERROR: status
= -112<br>
Sep 7 10:45:49 dbtest-01 kernel:
(dlm_thread,4256,1):dlm_flush_asts:604 ERROR: status = -112<br>
Sep 7 10:45:49 dbtest-01 kernel:
(dlm_thread,3781,2):dlm_flush_asts:604 ERROR: status = -112<br>
Sep 7 10:46:19 dbtest-01 kernel:
(o2net,3736,3):o2net_connect_expired:1664 ERROR: no
connection established with node 1 after 30.0 seconds,
giving up and returning errors.<br>
Sep 7 10:46:19 dbtest-01 kernel: o2net: accepted connection
from node dbtest-02 (num 1) at 10.194.59.65:7777<br>
Sep 7 10:48:37 dbtest-01 kernel: INFO: task events/0:10
blocked for more than 120 seconds.<br>
Sep 7 10:48:37 dbtest-01 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this
message.<br>
Sep 7 10:48:37 dbtest-01 kernel: events/0 D
ffff810001004420 0 10 1 11 9
(L-TLB)<br>
Sep 7 10:48:37 dbtest-01 kernel: ffff81083ffedc80
0000000000000046 ffffffff80333680 0000000000000001<br>
Sep 7 10:48:37 dbtest-01 kernel: 0000000000000400
000000000000000a ffff81083ffe1820 ffffffff80309b60<br>
Sep 7 10:48:37 dbtest-01 kernel: 0030b62498ce7b3f
000000000000416b ffff81083ffe1a08 0000000000000000<br>
Sep 7 10:48:37 dbtest-01 kernel: Call Trace:<br>
Sep 7 10:48:37 dbtest-01 kernel: Call Trace:<br>
Sep 7 10:48:37 dbtest-01 kernel:
[<ffffffff80064167>] wait_for_completion+0x79/0xa2<br>
Sep 7 10:48:37 dbtest-01 kernel:
[<ffffffff8008e16d>] default_wake_function+0x0/0xe<br>
Sep 7 10:48:37 dbtest-01 kernel:
[<ffffffff884e64b7>]
:ocfs2:ocfs2_wait_for_mask+0xd/0x19<br>
Sep 7 10:48:37 dbtest-01 kernel:
[<ffffffff884e78d8>]
:ocfs2:ocfs2_cluster_lock+0x9ae/0x9d3<br>
Sep 7 10:48:37 dbtest-01 kernel:
[<ffffffff885013e5>]
:ocfs2:ocfs2_orphan_scan_work+0x0/0x83<br>
Sep 7 10:48:37 dbtest-01 kernel:
[<ffffffff884ed1e4>]
:ocfs2:ocfs2_orphan_scan_lock+0x55/0x84<br>
Sep 7 10:48:37 dbtest-01 kernel:
[<ffffffff884fc59b>]
:ocfs2:ocfs2_queue_orphan_scan+0x32/0x147<br>
Sep 7 10:48:37 dbtest-01 kernel:
[<ffffffff885013ff>]
:ocfs2:ocfs2_orphan_scan_work+0x1a/0x83<br>
Sep 7 10:48:37 dbtest-01 kernel:
[<ffffffff8004dc37>] run_workqueue+0x94/0xe4<br>
Sep 7 10:48:37 dbtest-01 kernel:
[<ffffffff8004a472>] worker_thread+0x0/0x122<br>
Sep 7 10:48:37 dbtest-01 kernel:
[<ffffffff8004a562>] worker_thread+0xf0/0x122<br>
Sep 7 10:48:37 dbtest-01 kernel:
[<ffffffff8008e16d>] default_wake_function+0x0/0xe<br>
Sep 7 10:48:37 dbtest-01 kernel:
[<ffffffff80032bdc>] kthread+0xfe/0x132<br>
Sep 7 10:48:37 dbtest-01 kernel:
[<ffffffff8005efb1>] child_rip+0xa/0x11<br>
Sep 7 10:48:37 dbtest-01 kernel:
[<ffffffff80032ade>] kthread+0x0/0x132<br>
Sep 7 10:48:37 dbtest-01 kernel:
[<ffffffff8005efa7>] child_rip+0x0/0x11<br>
Sep 7 10:48:37 dbtest-01 kernel:<br>
<br>
Does anyone know why this happened?<br>
<br>
Thanks.<br>
</div>
<br>
_______________________________________________ Ocfs2-users
mailing list <a class="moz-txt-link-abbreviated" href="mailto:Ocfs2-users@oss.oracle.com">Ocfs2-users@oss.oracle.com</a>
<a class="moz-txt-link-freetext" href="http://oss.oracle.com/mailman/listinfo/ocfs2-users">http://oss.oracle.com/mailman/listinfo/ocfs2-users</a></div>
</div>
<pre wrap="">
<fieldset class="mimeAttachmentHeader"></fieldset>
_______________________________________________
Ocfs2-users mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Ocfs2-users@oss.oracle.com">Ocfs2-users@oss.oracle.com</a>
<a class="moz-txt-link-freetext" href="http://oss.oracle.com/mailman/listinfo/ocfs2-users">http://oss.oracle.com/mailman/listinfo/ocfs2-users</a></pre>
</blockquote>
<br>
</body>
</html>