<DIV>Andy,</DIV> <DIV> </DIV> <DIV> I found helpfull to diagnose this kind of hang to keep a priority 0 shell opened on the server. This shell usually keeps working even during heavy swapping or other situations where the system becomes unresponsive. You can start one with this command:</DIV> <DIV> </DIV> <DIV>nice -n -20 bash</DIV> <DIV> </DIV> <DIV> From this you could run a top or a vmstat to see what is happening when the server is unresponsive. Just be careful to not run any command that might generate a large output or have high CPU usage, as you might hang the server yourself.</DIV> <DIV> </DIV> <DIV>Regards,</DIV> <DIV>Luis<BR><BR><B><I>Andy Kipp <AKIPP@velcro.com></I></B> wrote:</DIV> <BLOCKQUOTE class=replbq style="PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #1010ff 2px solid">I checked bugzilla and what is happening is almost identical to bug #819. However, the "dead" node continues
to heartbeat, yet is unresponsive. No log output at all is generated on the "dead" node. This has been happening for a few months however frequency is increasing. Is there any information I can provide to hopefully figure this out?<BR><BR>- Andy<BR>-- <BR><BR>Andrew Kipp<BR>Network Administrator<BR>Velcro USA Inc.<BR>Email: akipp@velcro.com<BR>Work: (603) 222-4844<BR><BR>CONFIDENTIALITY NOTICE: This email is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e mail and destroy all copies of the original message. If you are the intended recipient but do not wish to receive communications through this medium, please so advise immediately.<BR><BR><BR>>>> On 3/9/2007 at 9:39 PM, in message <45F21A7F.5090802@oracle.com>, Sunil
Mushran<BR><SUNIL.MUSHRAN@ORACLE.COM>wrote:<BR>> File a bugzilla with the messages from all three nodes. Appears<BR>> node 2 went down but kept heartbeating. Strange. The messages<BR>> from node 2 may shed more light.<BR>> <BR>> Andy Kipp wrote:<BR>>> We are running OCFS2 on SLES9 machines using a FC SAN. Without warning both <BR>> nodes will become unresponsive. Can not access either machine via ssh or <BR>> terminal (hangs after typing in username). However the machine still responds <BR>> to pings. This continues until one node is rebooted, at which time the second <BR>> node resumes normal operations. <BR>>><BR>>> I am not entirely sure that this is an OCFS2 problem at all however the <BR>> syslog shows it had issues Here is the log from the node that was not <BR>> rebooted. The node that was rebooted contained no log information. The system <BR>> appeared to have gone down at about 3AM, until the node was rebooted
at <BR>> around 7:15.<BR>>><BR>>> Mar 8 03:06:32 groupwise-1-mht kernel: o2net: connection to node <BR>> groupwise-2-mht (num 2) at 192.168.1.3:7777 has been idle for 10 seconds, <BR>> shutting it down.<BR>>> Mar 8 03:06:32 groupwise-1-mht kernel: (0,2):o2net_idle_timer:1310 here are <BR>> some times that might help debug the situation: (tmr 1173341182.367220 now <BR>> 1173341192.367244 dr 1173341182.367213 adv <BR>> 1173341182.367228:1173341182.367229 func (05ce6220:2) <BR>> 1173341182.367221:1173341182.367224)<BR>>> Mar 8 03:06:32 groupwise-1-mht kernel: o2net: no longer connected to node <BR>> groupwise-2-mht (num 2) at 192.168.1.3:7777<BR>>> Mar 8 03:06:32 groupwise-1-mht kernel: (499,0):dlm_do_master_request:1330 <BR>> ERROR: link to 2 went down!<BR>>> Mar 8 03:06:32 groupwise-1-mht kernel: (499,0):dlm_get_lock_resource:914 <BR>> ERROR: status = -112<BR>>> Mar 8 03:13:02 groupwise-1-mht kernel:
(8476,0):dlm_send_proxy_ast_msg:458 <BR>> ERROR: status = -107<BR>>> Mar 8 03:13:02 groupwise-1-mht kernel: (8476,0):dlm_flush_asts:607 ERROR: <BR>> status = -107<BR>>> Mar 8 03:19:54 groupwise-1-mht kernel: <BR>> (147,1):dlm_send_remote_unlock_request:356 ERROR: status = -107<BR>>> Mar 8 03:19:54 groupwise-1-mht last message repeated 127 times<BR>>> Mar 8 03:19:55 groupwise-1-mht kernel: (873,0):dlm_do_master_request:1330 <BR>> ERROR: link to 2 went down!<BR>>> Mar 8 03:19:55 groupwise-1-mht kernel: (873,0):dlm_get_lock_resource:914 <BR>> ERROR: status = -107<BR>>> Mar 8 03:19:55 groupwise-1-mht kernel: (901,0):dlm_do_master_request:1330 <BR>> ERROR: link to 2 went down!<BR>>> Mar 8 03:19:55 groupwise-1-mht kernel: (901,0):dlm_get_lock_resource:914 <BR>> ERROR: status = -107<BR>>> Mar 8 03:19:56 groupwise-1-mht kernel: (929,0):dlm_do_master_request:1330 <BR>> ERROR: link to 2 went down!<BR>>>
Mar 8 03:19:56 groupwise-1-mht kernel: (929,0):dlm_get_lock_resource:914 <BR>> ERROR: status = -107<BR>>> Mar 8 03:45:29 groupwise-1-mht -- MARK --<BR>>> Mar 8 04:15:02 groupwise-1-mht kernel: <BR>> (147,1):dlm_send_remote_unlock_request:356 ERROR: status = -107<BR>>> Mar 8 04:15:03 groupwise-1-mht last message repeated 383 times<BR>>> Mar 8 06:27:54 groupwise-1-mht kernel: <BR>> (147,1):dlm_send_remote_unlock_request:356 ERROR: status = -107<BR>>> Mar 8 06:27:54 groupwise-1-mht last message repeated 127 times<BR>>> Mar 8 06:27:54 groupwise-1-mht kernel: <BR>> (147,1):dlm_send_remote_unlock_request:356 ERROR: status = -107<BR>>> Mar 8 06:27:54 groupwise-1-mht last message repeated 127 times<BR>>> Mar 8 06:35:48 groupwise-1-mht kernel: (8872,0):dlm_do_master_request:1330 <BR>> ERROR: link to 2 went down!<BR>>> Mar 8 06:35:48 groupwise-1-mht kernel: (8872,0):dlm_get_lock_resource:914 <BR>> ERROR: status
= -107<BR>>> Mar 8 06:52:45 groupwise-1-mht kernel: (8861,0):dlm_do_master_request:1330 <BR>> ERROR: link to 2 went down!<BR>>> Mar 8 06:52:45 groupwise-1-mht kernel: (8861,0):dlm_get_lock_resource:914 <BR>> ERROR: status = -107<BR>>> Mar 8 06:54:11 groupwise-1-mht kernel: (8854,3):ocfs2_broadcast_vote:725 <BR>> ERROR: status = -107<BR>>> Mar 8 06:54:11 groupwise-1-mht kernel: (8854,3):ocfs2_do_request_vote:798 <BR>> ERROR: status = -107<BR>>> Mar 8 06:54:11 groupwise-1-mht kernel: (8854,3):ocfs2_unlink:840 ERROR: <BR>> status = -107<BR>>> Mar 8 06:54:18 groupwise-1-mht kernel: (8855,0):ocfs2_broadcast_vote:725 <BR>> ERROR: status = -107<BR>>> Mar 8 06:54:18 groupwise-1-mht kernel: (8855,0):ocfs2_do_request_vote:798 <BR>> ERROR: status = -107<BR>>> Mar 8 06:54:18 groupwise-1-mht kernel: (8855,0):ocfs2_unlink:840 ERROR: <BR>> status = -107<BR>>> Mar 8 06:54:18 groupwise-1-mht kernel:
(8855,0):ocfs2_broadcast_vote:725 <BR>> ERROR: status = -107<BR>>> Mar 8 06:54:18 groupwise-1-mht kernel: (8855,0):ocfs2_do_request_vote:798 <BR>> ERROR: status = -107<BR>>> Mar 8 06:54:18 groupwise-1-mht kernel: (8855,0):ocfs2_unlink:840 ERROR: <BR>> status = -107<BR>>> Mar 8 06:54:58 groupwise-1-mht kernel: (8853,0):ocfs2_broadcast_vote:725 <BR>> ERROR: status = -107<BR>>> Mar 8 06:54:58 groupwise-1-mht kernel: (8853,0):ocfs2_do_request_vote:798 <BR>> ERROR: status = -107<BR>>> Mar 8 06:54:58 groupwise-1-mht kernel: (8853,0):ocfs2_unlink:840 ERROR: <BR>> status = -107<BR>>> Mar 8 07:09:41 groupwise-1-mht kernel: (4192,0):dlm_do_master_request:1330 <BR>> ERROR: link to 2 went down!<BR>>> Mar 8 07:09:41 groupwise-1-mht kernel: (4192,0):dlm_get_lock_resource:914 <BR>> ERROR: status = -107<BR>>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_broadcast_vote:725 <BR>> ERROR: status =
-107<BR>>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_do_request_vote:798 <BR>> ERROR: status = -107<BR>>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_unlink:840 ERROR: <BR>> status = -107<BR>>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_broadcast_vote:725 <BR>> ERROR: status = -107<BR>>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_do_request_vote:798 <BR>> ERROR: status = -107<BR>>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_unlink:840 ERROR: <BR>> status = -107<BR>>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_broadcast_vote:725 <BR>> ERROR: status = -107<BR>>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_do_request_vote:798 <BR>> ERROR: status = -107<BR>>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_unlink:840 ERROR: <BR>> status = -107<BR>>> Mar 8 07:14:09 groupwise-1-mht kernel:
(4236,0):ocfs2_broadcast_vote:725 <BR>> ERROR: status = -107<BR>>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_do_request_vote:798 <BR>> ERROR: status = -107<BR>>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_unlink:840 ERROR: <BR>> status = -107<BR>>> Mar 8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_broadcast_vote:725 <BR>> ERROR: status = -107<BR>>> Mar 8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_do_request_vote:798 <BR>> ERROR: status = -107<BR>>> Mar 8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_unlink:840 ERROR: <BR>> status = -107<BR>>> Mar 8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_broadcast_vote:725 <BR>> ERROR: status = -107<BR>>> Mar 8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_do_request_vote:798 <BR>> ERROR: status = -107<BR>>> Mar 8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_unlink:840 ERROR: <BR>> status = -107<BR>>> Mar 8
07:16:13 groupwise-1-mht kernel: (4253,0):ocfs2_broadcast_vote:725 <BR>> ERROR: status = -107<BR>>> Mar 8 07:16:13 groupwise-1-mht kernel: (4253,0):ocfs2_do_request_vote:798 <BR>> ERROR: status = -107<BR>>> Mar 8 07:16:13 groupwise-1-mht kernel: (4253,0):ocfs2_unlink:840 ERROR: <BR>> status = -107<BR>>> Mar 8 07:18:57 groupwise-1-mht kernel: (4341,0):dlm_do_master_request:1330 <BR>> ERROR: link to 2 went down!<BR>>> Mar 8 07:18:57 groupwise-1-mht kernel: (4341,0):dlm_get_lock_resource:914 <BR>> ERROR: status = -107<BR>>> Mar 8 07:19:24 groupwise-1-mht kernel: (4356,0):ocfs2_broadcast_vote:725 <BR>> ERROR: status = -107<BR>>> Mar 8 07:19:24 groupwise-1-mht kernel: (4356,0):ocfs2_do_request_vote:798 <BR>> ERROR: status = -107 Mar 8 07:19:24 groupwise-1-mht kernel: <BR>> (4356,0):ocfs2_unlink:840 ERROR: status = -107<BR>>> Mar 8 07:20:49 groupwise-1-mht sshd[4375]: Accepted publickey for root from <BR>>
10.1.31.27 port 1752 ssh2<BR>>> Mar 8 07:20:50 groupwise-1-mht kernel: <BR>> (147,0):dlm_send_remote_unlock_request:356 ERROR: status = -107 Mar 8 <BR>> 07:20:50 groupwise-1-mht last message repeated 255 times<BR>>> Mar 8 07:20:53 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:20:53 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:20:58 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:20:58 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:21:03 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status =
-107<BR>>> Mar 8 07:21:03 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:21:08 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:21:08 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:21:13 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:21:13 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:21:19 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:21:19
groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:21:24 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:21:24 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:21:29 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:21:29 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:21:34 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:21:34 groupwise-1-mht kernel:
(4377,0):dlm_wait_for_node_death:371 <BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:21:39 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:21:39 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:21:44 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:21:44 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:21:49 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:21:49 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
<BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:21:54 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:21:54 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:21:59 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107 Mar 8 <BR>> 07:21:59 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2 Mar 8 07:22:04 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:22:04 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting
5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:22:10 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:22:10 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:22:15 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:22:20 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:22:20 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:22:25 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:22:25 groupwise-1-mht
kernel: (4377,0):dlm_wait_for_node_death:371 <BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:22:30 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:22:30 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:22:35 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:22:35 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:22:40 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:22:40 groupwise-1-mht kernel:
(4377,0):dlm_wait_for_node_death:371 <BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:22:45 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:22:45 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:22:50 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:22:50 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:22:55 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:22:55 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
<BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:23:01 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:23:01 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:23:06 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:23:06 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:23:11 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:23:11 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>>
2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:23:16 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:23:16 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:23:21 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:23:21 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:23:26 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:23:26 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>> 2062CE05ABA246988E9CCCDAE253F458:
waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:23:31 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:23:31 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:23:36 groupwise-1-mht kernel: <BR>> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107<BR>>> Mar 8 07:23:36 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 <BR>> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of <BR>> node 2<BR>>> Mar 8 07:23:40 groupwise-1-mht kernel: (28613,2):dlm_get_lock_resource:847 <BR>> B6ECAF5A668A4573AF763908F26958DB:$RECOVERY: at least one node (2) torecover <BR>> before lock mastery can begin<BR>>> Mar 8 07:23:40 groupwise-1-mht kernel:
(28613,2):dlm_get_lock_resource:874 <BR>> B6ECAF5A668A4573AF763908F26958DB: recovery map is not empty, but must master <BR>> $RECOVERY lock now<BR>>> Mar 8 07:23:41 groupwise-1-mht kernel: (4432,0):ocfs2_replay_journal:1176 <BR>> Recovering node 2 from slot 1 on device (253,1)<BR>>> Mar 8 07:23:41 groupwise-1-mht kernel: (4192,0):dlm_restart_lock_mastery:1214 <BR>> ERROR: node down! 2<BR>>> Mar 8 07:23:41 groupwise-1-mht kernel: <BR>> (4192,0):dlm_wait_for_lock_mastery:1035 ERROR: status = -11<BR>>> Mar 8 07:23:41 groupwise-1-mht kernel: (929,1):dlm_restart_lock_mastery:1214 <BR>> ERROR: node down! 2<BR>>> Mar 8 07:23:41 groupwise-1-mht kernel: (929,1):dlm_wait_for_lock_mastery:1035 <BR>> ERROR: status = -11<BR>>> Mar 8 07:23:42 groupwise-1-mht kernel: (4341,1):dlm_restart_lock_mastery:1214 <BR>> ERROR: node down! 2<BR>>> Mar 8 07:23:42 groupwise-1-mht kernel: <BR>> (4341,1):dlm_wait_for_lock_mastery:1035
ERROR: status = -11<BR>>> Mar 8 07:23:42 groupwise-1-mht kernel: (4341,1):dlm_restart_lock_mastery:1214 <BR>> ERROR: node down! 2<BR>>> Mar 8 07:23:42 groupwise-1-mht kernel: <BR>> (4341,1):dlm_wait_for_lock_mastery:1035 ERROR: status = -11<BR>>> Mar 8 07:23:42 groupwise-1-mht kernel: (4192,0):dlm_get_lock_resource:895 <BR>> 2062CE05ABA246988E9CCCDAE253F458:D000000000000000037872ff59e2a10: at least <BR>> one node (2) torecover before lock mastery can begin<BR>>> Mar 8 07:23:42 groupwise-1-mht kernel: (499,1):dlm_restart_lock_mastery:1214 <BR>> ERROR: node down! 2<BR>>> Mar 8 07:23:42 groupwise-1-mht kernel: (499,1):dlm_wait_for_lock_mastery:1035 <BR>> ERROR: status = -11<BR>>> Mar 8 07:23:42 groupwise-1-mht kernel: (929,1):dlm_get_lock_resource:895 <BR>> 2062CE05ABA246988E9CCCDAE253F458:M0000000000000002d2ab960a02ee32: at least <BR>> one node (2) torecover before lock mastery can begin<BR>>> Mar 8 07:23:43
groupwise-1-mht kernel: (4341,1):dlm_get_lock_resource:895 <BR><BR>=== message truncated ===</BLOCKQUOTE><BR><p> 
<hr size=1><a href="http://us.rd.yahoo.com/evt=49935/*http://games.yahoo.com">Bored stiff?</a> Loosen up...<br><a href="http://us.rd.yahoo.com/evt=49935/*http://games.yahoo.com">Download and play hundreds of games for free</a> on Yahoo! Games.