<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd">
<html><head><meta name="qrichtext" content="1" /><style type="text/css">
p, li { white-space: pre-wrap; }
</style></head><body style=" font-family:'Sans Serif'; font-size:10pt; font-weight:400; font-style:normal;">
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Hi Sunil,</p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;"></p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">another hang here:</p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;"></p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">- node2 (making heavy I/O operation at that time) was up, ping to it was ok (following your suggestion I started a ping from node1 to node2 some days ago and no packet was loss, so the switch are ok) but I cannot login via ssh or from the console (hang)</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">- node1 was up with low load but was unable to write to the ocfs2 partition (a simple touch <file> hang).</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">- after rebooting node1, node2 restarted to work and I was able to access to it,</p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;"></p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">I'm investigating with EMC for hardware problems too but this seems to me an ocfs2 problem.</p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;"></p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Here are some logs:</p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;"></p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;"><span style=" font-weight:600;">node1:</span></p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0; font-weight:600;"></p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 14:59:33 nvr1-rc kernel: o2net: connection to node nvr2-rc.minint.it (num 0) at 1.1.1.6:7777 has been idle for 35.0 seconds, shutting it down.</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 14:59:33 nvr1-rc kernel: (0,7):o2net_idle_timer:1503 here are some times that might help debug the situation: (tmr 1264600738.986852 now 1264600773.986647 dr 1264600738.989741 adv 1264600738.989753:1264600738.989754 func (75554989:507) 1264600736.951519:1264600736.951521)</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 14:59:33 nvr1-rc kernel: o2net: no longer connected to node nvr2-rc.minint.it (num 0) at 1.1.1.6:7777</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 14:59:33 nvr1-rc kernel: (4025,5):dlm_drop_lockres_ref:2211 ERROR: status = -112</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 14:59:33 nvr1-rc kernel: (4025,5):dlm_purge_lockres:206 ERROR: status = -112</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 14:59:33 nvr1-rc kernel: (4025,5):dlm_drop_lockres_ref:2211 ERROR: status = -107</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 14:59:33 nvr1-rc kernel: (4025,5):dlm_purge_lockres:206 ERROR: status = -107</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 14:59:33 nvr1-rc kernel: (4025,5):dlm_drop_lockres_ref:2211 ERROR: status = -107</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 14:59:33 nvr1-rc kernel: (4025,5):dlm_purge_lockres:206 ERROR: status = -107</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 14:59:33 nvr1-rc kernel: (4025,5):dlm_drop_lockres_ref:2211 ERROR: status = -107</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 14:59:33 nvr1-rc kernel: (4025,5):dlm_purge_lockres:206 ERROR: status = -107</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 14:59:33 nvr1-rc kernel: (4025,5):dlm_drop_lockres_ref:2211 ERROR: status = -107</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 14:59:33 nvr1-rc kernel: (4025,5):dlm_purge_lockres:206 ERROR: status = -107</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 14:59:33 nvr1-rc kernel: (4025,5):dlm_drop_lockres_ref:2211 ERROR: status = -107</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 14:59:33 nvr1-rc kernel: (4025,5):dlm_purge_lockres:206 ERROR: status = -107</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 14:59:33 nvr1-rc kernel: (4025,5):dlm_send_proxy_ast_msg:458 ERROR: status = -107</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 14:59:33 nvr1-rc kernel: (4025,5):dlm_flush_asts:600 ERROR: status = -107</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 14:59:33 nvr1-rc kernel: o2net: connected to node nvr2-rc.minint.it (num 0) at 1.1.1.6:7777 </p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;"></p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;"><span style=" font-weight:600;">node2:</span></p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;"></p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 14:59:34 nvr2-rc kernel: o2net: no longer connected to node nvr1-rc.minint.it (num 1) at 1.1.1.5:7777</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 14:59:34 nvr2-rc kernel: (4000,4):dlm_drop_lockres_ref:2211 ERROR: status = -112</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 14:59:34 nvr2-rc kernel: (6208,5):dlm_send_remote_unlock_request:359 ERROR: status = -112</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 14:59:34 nvr2-rc kernel: (3344,3):dlm_do_master_request:1334 ERROR: link to 1 went down!</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 14:59:34 nvr2-rc kernel: (6056,1):dlm_do_master_request:1334 ERROR: link to 1 went down!</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 14:59:34 nvr2-rc kernel: (6056,1):dlm_get_lock_resource:917 ERROR: status = -112</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 14:59:34 nvr2-rc kernel: (3344,3):dlm_get_lock_resource:917 ERROR: status = -112</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 14:59:34 nvr2-rc kernel: (4000,4):dlm_purge_lockres:206 ERROR: status = -112</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 14:59:34 nvr2-rc kernel: (5214,4):dlm_send_remote_unlock_request:359 ERROR: status = -112</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 14:59:34 nvr2-rc kernel: o2net: accepted connection from node nvr1-rc.minint.it (num 1) at 1.1.1.5:7777</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 15:29:30 nvr2-rc kernel: o2net: connection to node nvr1-rc.minint.it (num 1) at 1.1.1.5:7777 has been idle for 35.0 seconds, shutting it down.</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 15:29:30 nvr2-rc kernel: (0,5):o2net_idle_timer:1503 here are some times that might help debug the situation: (tmr 1264602535.751483 now 1264602570.751723 dr 1264602535.751469 adv 1264602535.751491:1264602535.751491 func (75554989:505) 1264602517.753992:1264602517.753999)</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 15:29:30 nvr2-rc kernel: o2net: no longer connected to node nvr1-rc.minint.it (num 1) at 1.1.1.5:7777</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 15:30:05 nvr2-rc kernel: (3943,1):o2net_connect_expired:1664 ERROR: no connection established with node 1 after 35.0 seconds, giving up and returning errors.</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 15:30:40 nvr2-rc kernel: (3943,5):o2net_connect_expired:1664 ERROR: no connection established with node 1 after 35.0 seconds, giving up and returning errors.</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 15:30:58 nvr2-rc kernel: (3985,1):ocfs2_dlm_eviction_cb:98 device (120,1): dlm has evicted node 1</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 15:30:58 nvr2-rc kernel: (6983,5):dlm_get_lock_resource:844 3AE0B7F3BAB749D09D37DAE16FA38042:M00000000000000000000122d757fa5: at least one node (1) to recover before lock mastery can begin</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 15:30:58 nvr2-rc kernel: (6056,1):dlm_restart_lock_mastery:1223 ERROR: node down! 1</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 15:30:58 nvr2-rc kernel: (6056,1):dlm_wait_for_lock_mastery:1040 ERROR: status = -11</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 15:30:58 nvr2-rc kernel: (3344,3):dlm_restart_lock_mastery:1223 ERROR: node down! 1</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 15:30:58 nvr2-rc kernel: (3344,3):dlm_wait_for_lock_mastery:1040 ERROR: status = -11</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 15:30:59 nvr2-rc kernel: (6983,5):dlm_get_lock_resource:898 3AE0B7F3BAB749D09D37DAE16FA38042:M00000000000000000000122d757fa5: at least one node (1) to recover before lock mastery can begin</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 15:30:59 nvr2-rc kernel: (6056,1):dlm_get_lock_resource:898 3AE0B7F3BAB749D09D37DAE16FA38042:N00000000000201de: at least one node (1) to recover before lock mastery can begin</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 15:30:59 nvr2-rc kernel: (3344,3):dlm_get_lock_resource:898 3AE0B7F3BAB749D09D37DAE16FA38042:M000000000000004dc281b900000000: at least one node (1) to recover before lock mastery can begin</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 15:31:01 nvr2-rc kernel: (4001,6):dlm_get_lock_resource:844 3AE0B7F3BAB749D09D37DAE16FA38042:$RECOVERY: at least one node (1) to recover before lock mastery can begin</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 15:31:01 nvr2-rc kernel: (4001,6):dlm_get_lock_resource:878 3AE0B7F3BAB749D09D37DAE16FA38042: recovery map is not empty, but must master $RECOVERY lock now</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 15:31:01 nvr2-rc kernel: (4001,6):dlm_do_recovery:524 (4001) Node 0 is the Recovery Master for the Dead Node 1 for Domain 3AE0B7F3BAB749D09D37DAE16FA38042</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Jan 27 15:31:10 nvr2-rc kernel: (6983,1):ocfs2_replay_journal:1183 Recovering node 1 from slot 0 on device (120,1)</p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;"></p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">thanks</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">Nicola</p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;"></p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">In data giovedì 14 gennaio 2010 21:13:15, Sunil Mushran ha scritto:</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">: > Mailing List SVR wrote:</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > Hi,</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> ></p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > periodically one of on my two nodes cluster is fenced here are the logs:</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> ></p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > Jan 14 07:01:44 nvr1-rc kernel: o2net: no longer connected to node nvr2-</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > rc.minint.it (num 0) at 1.1.1.6:7777</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > Jan 14 07:01:44 nvr1-rc kernel: (21534,1):dlm_do_master_request:1334</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > ERROR: link to 0 went down!</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > Jan 14 07:01:44 nvr1-rc kernel: (4007,4):dlm_send_proxy_ast_msg:458</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > ERROR: status = -112</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > Jan 14 07:01:44 nvr1-rc kernel: (4007,4):dlm_flush_asts:600 ERROR: status</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > = -112</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > Jan 14 07:01:44 nvr1-rc kernel: (21534,1):dlm_get_lock_resource:917</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > ERROR: status = -112</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > Jan 14 07:02:19 nvr1-rc kernel: (3950,5):o2net_connect_expired:1664</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > ERROR: no connection established with node 0 after 35.0 seconds, giving</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > up and returning errors.</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > Jan 14 07:02:54 nvr1-rc kernel: (3950,5):o2net_connect_expired:1664</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > ERROR: no connection established with node 0 after 35.0 seconds, giving</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > up and returning errors.</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > Jan 14 07:03:10 nvr1-rc kernel: (4007,4):dlm_send_proxy_ast_msg:458</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > ERROR: status = -107</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > Jan 14 07:03:10 nvr1-rc kernel: (4007,4):dlm_flush_asts:600 ERROR: status</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > = -107</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > Jan 14 07:03:29 nvr1-rc kernel: (3950,5):o2net_connect_expired:1664</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > ERROR: no connection established with node 0 after 35.0 seconds, giving</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > up and returning errors.</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > Jan 14 07:03:50 nvr1-rc kernel: (31,5):o2quo_make_decision:146 ERROR:</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > fencing this node because it is connected to a half-quorum of 1 out of 2</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > nodes which doesn't include the lowest active node 0</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > Jan 14 07:03:50 nvr1-rc kernel: (31,5):o2hb_stop_all_regions:1967 ERROR:</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > stopping heartbeat on all active regions.</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> ></p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > I'm sure there are no network connectivity problem but it is possible</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > that there are heavy IO loads, is this the intended behaviour? Why under</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > heavy load the loaded node is fenced?</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> ></p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> > I'm using ocfs2-1.4.4 on rhel5 kernel-2.6.18-164.6.1.el5</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> </p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> So the network connection snapped. What it means is that the nodes</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> could not ping each other for 35 seconds. In fact node 1 (this one),</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> tried to reconnect to node 0 but got no reply back. So the network</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> issue lasted for over 2 mins.</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> </p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> Switch could be one culprit. See if the switch logs say something. Other</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> possibility is that node 0 was paging heavily. Or kswapd was pegged</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> at 100%. This is hard to determine after the fact. Something to keep in</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> mind the next time you see the same issue. If that is the case, then that</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> needs to be fixed. Maybe add more memory. Or, if you are running the</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> database, ensure you are using hugepages. etc.</p>
<p style=" margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;">> </p>
<p style="-qt-paragraph-type:empty; margin-top:0px; margin-bottom:0px; margin-left:0px; margin-right:0px; -qt-block-indent:0; text-indent:0px; -qt-user-state:0;"></p></body></html>