<DIV>Andy,</DIV>  <DIV>&nbsp;</DIV>  <DIV>&nbsp;&nbsp; I found helpfull to diagnose this kind of hang to keep a priority 0 shell opened on the server. This shell usually&nbsp;keeps working&nbsp;even during heavy swapping or other situations where the system becomes unresponsive. You can start one with this command:</DIV>  <DIV>&nbsp;</DIV>  <DIV>nice -n -20 bash</DIV>  <DIV>&nbsp;</DIV>  <DIV>&nbsp;&nbsp;&nbsp; From this you could run a top or a vmstat to see what is happening when the server is unresponsive. Just be careful to not run any command that might generate a large output or have high CPU usage, as you might hang the server yourself.</DIV>  <DIV>&nbsp;</DIV>  <DIV>Regards,</DIV>  <DIV>Luis<BR><BR><B><I>Andy Kipp &lt;AKIPP@velcro.com&gt;</I></B> wrote:</DIV>  <BLOCKQUOTE class=replbq style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #1010ff 2px solid">I checked bugzilla and what is happening is almost identical to bug #819. However, the "dead" node continues

 to heartbeat, yet is unresponsive. No log output at all is generated on the "dead" node. This has been happening for a few months however frequency is increasing. Is there any information I can provide to hopefully figure this out?<BR><BR>- Andy<BR>-- <BR><BR>Andrew Kipp<BR>Network Administrator<BR>Velcro USA Inc.<BR>Email: akipp@velcro.com<BR>Work: (603) 222-4844<BR><BR>CONFIDENTIALITY NOTICE: This email is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e mail and destroy all copies of the original message. If you are the intended recipient but do not wish to receive communications through this medium, please so advise immediately.<BR><BR><BR>&gt;&gt;&gt; On 3/9/2007 at 9:39 PM, in message &lt;45F21A7F.5090802@oracle.com&gt;, Sunil

 Mushran<BR><SUNIL.MUSHRAN@ORACLE.COM>wrote:<BR>&gt; File a bugzilla with the messages from all three nodes. Appears<BR>&gt; node 2 went down but kept heartbeating. Strange. The messages<BR>&gt; from node 2 may shed more light.<BR>&gt; <BR>&gt; Andy Kipp wrote:<BR>&gt;&gt; We are running OCFS2 on SLES9 machines using a FC SAN. Without warning both <BR>&gt; nodes will become unresponsive. Can not access either machine via ssh or <BR>&gt; terminal (hangs after typing in username). However the machine still responds <BR>&gt; to pings. This continues until one node is rebooted, at which time the second <BR>&gt; node resumes normal operations. <BR>&gt;&gt;<BR>&gt;&gt; I am not entirely sure that this is an OCFS2 problem at all however the <BR>&gt; syslog shows it had issues Here is the log from the node that was not <BR>&gt; rebooted. The node that was rebooted contained no log information. The system <BR>&gt; appeared to have gone down at about 3AM, until the node was rebooted

 at <BR>&gt; around 7:15.<BR>&gt;&gt;<BR>&gt;&gt; Mar 8 03:06:32 groupwise-1-mht kernel: o2net: connection to node <BR>&gt; groupwise-2-mht (num 2) at 192.168.1.3:7777 has been idle for 10 seconds, <BR>&gt; shutting it down.<BR>&gt;&gt; Mar 8 03:06:32 groupwise-1-mht kernel: (0,2):o2net_idle_timer:1310 here are <BR>&gt; some times that might help debug the situation: (tmr 1173341182.367220 now <BR>&gt; 1173341192.367244 dr 1173341182.367213 adv <BR>&gt; 1173341182.367228:1173341182.367229 func (05ce6220:2) <BR>&gt; 1173341182.367221:1173341182.367224)<BR>&gt;&gt; Mar 8 03:06:32 groupwise-1-mht kernel: o2net: no longer connected to node <BR>&gt; groupwise-2-mht (num 2) at 192.168.1.3:7777<BR>&gt;&gt; Mar 8 03:06:32 groupwise-1-mht kernel: (499,0):dlm_do_master_request:1330 <BR>&gt; ERROR: link to 2 went down!<BR>&gt;&gt; Mar 8 03:06:32 groupwise-1-mht kernel: (499,0):dlm_get_lock_resource:914 <BR>&gt; ERROR: status = -112<BR>&gt;&gt; Mar 8 03:13:02 groupwise-1-mht kernel:

 (8476,0):dlm_send_proxy_ast_msg:458 <BR>&gt; ERROR: status = -107<BR>&gt;&gt; Mar 8 03:13:02 groupwise-1-mht kernel: (8476,0):dlm_flush_asts:607 ERROR: <BR>&gt; status = -107<BR>&gt;&gt; Mar 8 03:19:54 groupwise-1-mht kernel: <BR>&gt; (147,1):dlm_send_remote_unlock_request:356 ERROR: status = -107<BR>&gt;&gt; Mar 8 03:19:54 groupwise-1-mht last message repeated 127 times<BR>&gt;&gt; Mar 8 03:19:55 groupwise-1-mht kernel: (873,0):dlm_do_master_request:1330 <BR>&gt; ERROR: link to 2 went down!<BR>&gt;&gt; Mar 8 03:19:55 groupwise-1-mht kernel: (873,0):dlm_get_lock_resource:914 <BR>&gt; ERROR: status = -107<BR>&gt;&gt; Mar 8 03:19:55 groupwise-1-mht kernel: (901,0):dlm_do_master_request:1330 <BR>&gt; ERROR: link to 2 went down!<BR>&gt;&gt; Mar 8 03:19:55 groupwise-1-mht kernel: (901,0):dlm_get_lock_resource:914 <BR>&gt; ERROR: status = -107<BR>&gt;&gt; Mar 8 03:19:56 groupwise-1-mht kernel: (929,0):dlm_do_master_request:1330 <BR>&gt; ERROR: link to 2 went down!<BR>&gt;&gt;

 Mar 8 03:19:56 groupwise-1-mht kernel: (929,0):dlm_get_lock_resource:914 <BR>&gt; ERROR: status = -107<BR>&gt;&gt; Mar 8 03:45:29 groupwise-1-mht -- MARK --<BR>&gt;&gt; Mar 8 04:15:02 groupwise-1-mht kernel: <BR>&gt; (147,1):dlm_send_remote_unlock_request:356 ERROR: status = -107<BR>&gt;&gt; Mar 8 04:15:03 groupwise-1-mht last message repeated 383 times<BR>&gt;&gt; Mar 8 06:27:54 groupwise-1-mht kernel: <BR>&gt; (147,1):dlm_send_remote_unlock_request:356 ERROR: status = -107<BR>&gt;&gt; Mar 8 06:27:54 groupwise-1-mht last message repeated 127 times<BR>&gt;&gt; Mar 8 06:27:54 groupwise-1-mht kernel: <BR>&gt; (147,1):dlm_send_remote_unlock_request:356 ERROR: status = -107<BR>&gt;&gt; Mar 8 06:27:54 groupwise-1-mht last message repeated 127 times<BR>&gt;&gt; Mar 8 06:35:48 groupwise-1-mht kernel: (8872,0):dlm_do_master_request:1330 <BR>&gt; ERROR: link to 2 went down!<BR>&gt;&gt; Mar 8 06:35:48 groupwise-1-mht kernel: (8872,0):dlm_get_lock_resource:914 <BR>&gt; ERROR: status

 = -107<BR>&gt;&gt; Mar 8 06:52:45 groupwise-1-mht kernel: (8861,0):dlm_do_master_request:1330 <BR>&gt; ERROR: link to 2 went down!<BR>&gt;&gt; Mar 8 06:52:45 groupwise-1-mht kernel: (8861,0):dlm_get_lock_resource:914 <BR>&gt; ERROR: status = -107<BR>&gt;&gt; Mar 8 06:54:11 groupwise-1-mht kernel: (8854,3):ocfs2_broadcast_vote:725 <BR>&gt; ERROR: status = -107<BR>&gt;&gt; Mar 8 06:54:11 groupwise-1-mht kernel: (8854,3):ocfs2_do_request_vote:798 <BR>&gt; ERROR: status = -107<BR>&gt;&gt; Mar 8 06:54:11 groupwise-1-mht kernel: (8854,3):ocfs2_unlink:840 ERROR: <BR>&gt; status = -107<BR>&gt;&gt; Mar 8 06:54:18 groupwise-1-mht kernel: (8855,0):ocfs2_broadcast_vote:725 <BR>&gt; ERROR: status = -107<BR>&gt;&gt; Mar 8 06:54:18 groupwise-1-mht kernel: (8855,0):ocfs2_do_request_vote:798 <BR>&gt; ERROR: status = -107<BR>&gt;&gt; Mar 8 06:54:18 groupwise-1-mht kernel: (8855,0):ocfs2_unlink:840 ERROR: <BR>&gt; status = -107<BR>&gt;&gt; Mar 8 06:54:18 groupwise-1-mht kernel:

 (8855,0):ocfs2_broadcast_vote:725 <BR>&gt; ERROR: status = -107<BR>&gt;&gt; Mar 8 06:54:18 groupwise-1-mht kernel: (8855,0):ocfs2_do_request_vote:798 <BR>&gt; ERROR: status = -107<BR>&gt;&gt; Mar 8 06:54:18 groupwise-1-mht kernel: (8855,0):ocfs2_unlink:840 ERROR: <BR>&gt; status = -107<BR>&gt;&gt; Mar 8 06:54:58 groupwise-1-mht kernel: (8853,0):ocfs2_broadcast_vote:725 <BR>&gt; ERROR: status = -107<BR>&gt;&gt; Mar 8 06:54:58 groupwise-1-mht kernel: (8853,0):ocfs2_do_request_vote:798 <BR>&gt; ERROR: status = -107<BR>&gt;&gt; Mar 8 06:54:58 groupwise-1-mht kernel: (8853,0):ocfs2_unlink:840 ERROR: <BR>&gt; status = -107<BR>&gt;&gt; Mar 8 07:09:41 groupwise-1-mht kernel: (4192,0):dlm_do_master_request:1330 <BR>&gt; ERROR: link to 2 went down!<BR>&gt;&gt; Mar 8 07:09:41 groupwise-1-mht kernel: (4192,0):dlm_get_lock_resource:914 <BR>&gt; ERROR: status = -107<BR>&gt;&gt; Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_broadcast_vote:725 <BR>&gt; ERROR: status =

 -107<BR>&gt;&gt; Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_do_request_vote:798 <BR>&gt; ERROR: status = -107<BR>&gt;&gt; Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_unlink:840 ERROR: <BR>&gt; status = -107<BR>&gt;&gt; Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_broadcast_vote:725 <BR>&gt; ERROR: status = -107<BR>&gt;&gt; Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_do_request_vote:798 <BR>&gt; ERROR: status = -107<BR>&gt;&gt; Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_unlink:840 ERROR: <BR>&gt; status = -107<BR>&gt;&gt; Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_broadcast_vote:725 <BR>&gt; ERROR: status = -107<BR>&gt;&gt; Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_do_request_vote:798 <BR>&gt; ERROR: status = -107<BR>&gt;&gt; Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_unlink:840 ERROR: <BR>&gt; status = -107<BR>&gt;&gt; Mar 8 07:14:09 groupwise-1-mht kernel:

 (4236,0):ocfs2_broadcast_vote:725 <BR>&gt; ERROR: status = -107<BR>&gt;&gt; Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_do_request_vote:798 <BR>&gt; ERROR: status = -107<BR>&gt;&gt; Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_unlink:840 ERROR: <BR>&gt; status = -107<BR>&gt;&gt; Mar 8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_broadcast_vote:725 <BR>&gt; ERROR: status = -107<BR>&gt;&gt; Mar 8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_do_request_vote:798 <BR>&gt; ERROR: status = -107<BR>&gt;&gt; Mar 8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_unlink:840 ERROR: <BR>&gt; status = -107<BR>&gt;&gt; Mar 8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_broadcast_vote:725 <BR>&gt; ERROR: status = -107<BR>&gt;&gt; Mar 8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_do_request_vote:798 <BR>&gt; ERROR: status = -107<BR>&gt;&gt; Mar 8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_unlink:840 ERROR: <BR>&gt; status = -107<BR>&gt;&gt; Mar 8

 07:16:13 groupwise-1-mht kernel: (4253,0):ocfs2_broadcast_vote:725 <BR>&gt; ERROR: status = -107<BR>&gt;&gt; Mar 8 07:16:13 groupwise-1-mht kernel: (4253,0):ocfs2_do_request_vote:798 <BR>&gt; ERROR: status = -107<BR>&gt;&gt; Mar 8 07:16:13 groupwise-1-mht kernel: (4253,0):ocfs2_unlink:840 ERROR: <BR>&gt; status = -107<BR>&gt;&gt; Mar 8 07:18:57 groupwise-1-mht kernel: (4341,0):dlm_do_master_request:1330 <BR>&gt; ERROR: link to 2 went down!<BR>&gt;&gt; Mar 8 07:18:57 groupwise-1-mht kernel: (4341,0):dlm_get_lock_resource:914 <BR>&gt; ERROR: status = -107<BR>&gt;&gt; Mar 8 07:19:24 groupwise-1-mht kernel: (4356,0):ocfs2_broadcast_vote:725 <BR>&gt; ERROR: status = -107<BR>&gt;&gt; Mar 8 07:19:24 groupwise-1-mht kernel: (4356,0):ocfs2_do_request_vote:798 <BR>&gt; ERROR: status = -107 Mar 8 07:19:24 groupwise-1-mht kernel: <BR>&gt; (4356,0):ocfs2_unlink:840 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:20:49 groupwise-1-mht sshd[4375]: Accepted publickey for root from <BR>&gt;

 10.1.31.27 port 1752 ssh2<BR>&gt;&gt; Mar 8 07:20:50 groupwise-1-mht kernel: <BR>&gt; (147,0):dlm_send_remote_unlock_request:356 ERROR: status = -107 Mar 8 <BR>&gt; 07:20:50 groupwise-1-mht last message repeated 255 times<BR>&gt;&gt; Mar 8 07:20:53 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:20:53 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:20:58 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:20:58 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:21:03 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status =

 -107<BR>&gt;&gt; Mar 8 07:21:03 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:21:08 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:21:08 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:21:13 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:21:13 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:21:19 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:21:19

 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:21:24 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:21:24 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:21:29 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:21:29 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:21:34 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:21:34 groupwise-1-mht kernel:

 (4377,0):dlm_wait_for_node_death:371 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:21:39 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:21:39 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:21:44 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:21:44 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:21:49 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:21:49 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371

 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:21:54 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:21:54 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:21:59 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107 Mar 8 <BR>&gt; 07:21:59 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2 Mar 8 07:22:04 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:22:04 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting

 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:22:10 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:22:10 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:22:15 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:22:20 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:22:20 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:22:25 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:22:25 groupwise-1-mht

 kernel: (4377,0):dlm_wait_for_node_death:371 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:22:30 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:22:30 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:22:35 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:22:35 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:22:40 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:22:40 groupwise-1-mht kernel:

 (4377,0):dlm_wait_for_node_death:371 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:22:45 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:22:45 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:22:50 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:22:50 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:22:55 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:22:55 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371

 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:23:01 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:23:01 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:23:06 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:23:06 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:23:11 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:23:11 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>&gt;

 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:23:16 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:23:16 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:23:21 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:23:21 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:23:26 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:23:26 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458:

 waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:23:31 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:23:31 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:23:36 groupwise-1-mht kernel: <BR>&gt; (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>&gt;&gt; Mar 8 07:23:36 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>&gt; node 2<BR>&gt;&gt; Mar 8 07:23:40 groupwise-1-mht kernel: (28613,2):dlm_get_lock_resource:847 <BR>&gt; B6ECAF5A668A4573AF763908F26958DB:$RECOVERY: at least one node (2) torecover <BR>&gt; before lock mastery can begin<BR>&gt;&gt; Mar 8 07:23:40 groupwise-1-mht kernel:

 (28613,2):dlm_get_lock_resource:874 <BR>&gt; B6ECAF5A668A4573AF763908F26958DB: recovery map is not empty, but must master <BR>&gt; $RECOVERY lock now<BR>&gt;&gt; Mar 8 07:23:41 groupwise-1-mht kernel: (4432,0):ocfs2_replay_journal:1176 <BR>&gt; Recovering node 2 from slot 1 on device (253,1)<BR>&gt;&gt; Mar 8 07:23:41 groupwise-1-mht kernel: (4192,0):dlm_restart_lock_mastery:1214 <BR>&gt; ERROR: node down! 2<BR>&gt;&gt; Mar 8 07:23:41 groupwise-1-mht kernel: <BR>&gt; (4192,0):dlm_wait_for_lock_mastery:1035 ERROR: status = -11<BR>&gt;&gt; Mar 8 07:23:41 groupwise-1-mht kernel: (929,1):dlm_restart_lock_mastery:1214 <BR>&gt; ERROR: node down! 2<BR>&gt;&gt; Mar 8 07:23:41 groupwise-1-mht kernel: (929,1):dlm_wait_for_lock_mastery:1035 <BR>&gt; ERROR: status = -11<BR>&gt;&gt; Mar 8 07:23:42 groupwise-1-mht kernel: (4341,1):dlm_restart_lock_mastery:1214 <BR>&gt; ERROR: node down! 2<BR>&gt;&gt; Mar 8 07:23:42 groupwise-1-mht kernel: <BR>&gt; (4341,1):dlm_wait_for_lock_mastery:1035

 ERROR: status = -11<BR>&gt;&gt; Mar 8 07:23:42 groupwise-1-mht kernel: (4341,1):dlm_restart_lock_mastery:1214 <BR>&gt; ERROR: node down! 2<BR>&gt;&gt; Mar 8 07:23:42 groupwise-1-mht kernel: <BR>&gt; (4341,1):dlm_wait_for_lock_mastery:1035 ERROR: status = -11<BR>&gt;&gt; Mar 8 07:23:42 groupwise-1-mht kernel: (4192,0):dlm_get_lock_resource:895 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458:D000000000000000037872ff59e2a10: at least <BR>&gt; one node (2) torecover before lock mastery can begin<BR>&gt;&gt; Mar 8 07:23:42 groupwise-1-mht kernel: (499,1):dlm_restart_lock_mastery:1214 <BR>&gt; ERROR: node down! 2<BR>&gt;&gt; Mar 8 07:23:42 groupwise-1-mht kernel: (499,1):dlm_wait_for_lock_mastery:1035 <BR>&gt; ERROR: status = -11<BR>&gt;&gt; Mar 8 07:23:42 groupwise-1-mht kernel: (929,1):dlm_get_lock_resource:895 <BR>&gt; 2062CE05ABA246988E9CCCDAE253F458:M0000000000000002d2ab960a02ee32: at least <BR>&gt; one node (2) torecover before lock mastery can begin<BR>&gt;&gt; Mar 8 07:23:43

 groupwise-1-mht kernel: (4341,1):dlm_get_lock_resource:895 <BR><BR>=== message truncated ===</BLOCKQUOTE><BR><p>&#32;

<hr size=1><a href="http://us.rd.yahoo.com/evt=49935/*http://games.yahoo.com">Bored stiff?</a> Loosen up...<br><a href="http://us.rd.yahoo.com/evt=49935/*http://games.yahoo.com">Download and play hundreds of games for free</a> on Yahoo! Games.