[Ocfs2-users] ocfs2 cluster becomes unresponsive

Tue Mar 13 07:16:05 PDT 2007

I checked bugzilla and what is happening is almost identical to bug #819. However, the "dead" node continues to heartbeat, yet is unresponsive. No log output at all is generated on the "dead" node. This has been happening for a few months however frequency is increasing. Is there any information I can provide to hopefully figure this out?

- Andy
-- 

Andrew Kipp
Network Administrator
Velcro USA Inc.
Email: akipp at velcro.com
Work: (603) 222-4844

CONFIDENTIALITY NOTICE:  This email is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material.  Any unauthorized review, use, disclosure or distribution is prohibited.  If you are not the intended recipient, please contact the sender by reply e mail and destroy all copies of the original message.  If you are the intended recipient but do not wish to receive communications through this medium, please so advise immediately.

>>> On 3/9/2007 at 9:39 PM, in message <45F21A7F.5090802 at oracle.com>, Sunil Mushran
<Sunil.Mushran at oracle.com> wrote:
> File a bugzilla with the messages from all three nodes. Appears
> node 2 went down but kept heartbeating. Strange. The messages
> from node 2 may shed more light.
> 
> Andy Kipp wrote:
>> We are running OCFS2 on SLES9 machines using a FC SAN. Without warning both 
> nodes will become unresponsive. Can not access either machine via ssh or 
> terminal (hangs after typing in username). However the machine still responds 
> to pings. This continues until one node is rebooted, at which time the second 
> node resumes normal operations. 
>>
>> I am not entirely sure that this is an OCFS2 problem at all however the 
> syslog shows it had issues Here is the log from the node that was not 
> rebooted. The node that was rebooted contained no log information. The system 
> appeared to have gone down at about 3AM, until the node was rebooted at 
> around 7:15.
>>
>> Mar  8 03:06:32 groupwise-1-mht kernel: o2net: connection to node 
> groupwise-2-mht (num 2) at 192.168.1.3:7777 has been idle for 10 seconds, 
> shutting it down.
>> Mar  8 03:06:32 groupwise-1-mht kernel: (0,2):o2net_idle_timer:1310 here are 
> some times that might help debug the situation: (tmr 1173341182.367220 now 
> 1173341192.367244 dr 1173341182.367213 adv 
> 1173341182.367228:1173341182.367229 func (05ce6220:2) 
> 1173341182.367221:1173341182.367224)
>> Mar  8 03:06:32 groupwise-1-mht kernel: o2net: no longer connected to node 
> groupwise-2-mht (num 2) at 192.168.1.3:7777
>> Mar  8 03:06:32 groupwise-1-mht kernel: (499,0):dlm_do_master_request:1330 
> ERROR: link to 2 went down!
>> Mar  8 03:06:32 groupwise-1-mht kernel: (499,0):dlm_get_lock_resource:914 
> ERROR: status = -112
>> Mar  8 03:13:02 groupwise-1-mht kernel: (8476,0):dlm_send_proxy_ast_msg:458 
> ERROR: status = -107
>> Mar  8 03:13:02 groupwise-1-mht kernel: (8476,0):dlm_flush_asts:607 ERROR: 
> status = -107
>> Mar  8 03:19:54 groupwise-1-mht kernel: 
> (147,1):dlm_send_remote_unlock_request:356 ERROR: status = -107
>> Mar  8 03:19:54 groupwise-1-mht last message repeated 127 times
>> Mar  8 03:19:55 groupwise-1-mht kernel: (873,0):dlm_do_master_request:1330 
> ERROR: link to 2 went down!
>> Mar  8 03:19:55 groupwise-1-mht kernel: (873,0):dlm_get_lock_resource:914 
> ERROR: status = -107
>> Mar  8 03:19:55 groupwise-1-mht kernel: (901,0):dlm_do_master_request:1330 
> ERROR: link to 2 went down!
>> Mar  8 03:19:55 groupwise-1-mht kernel: (901,0):dlm_get_lock_resource:914 
> ERROR: status = -107
>> Mar  8 03:19:56 groupwise-1-mht kernel: (929,0):dlm_do_master_request:1330 
> ERROR: link to 2 went down!
>> Mar  8 03:19:56 groupwise-1-mht kernel: (929,0):dlm_get_lock_resource:914 
> ERROR: status = -107
>> Mar  8 03:45:29 groupwise-1-mht -- MARK --
>> Mar  8 04:15:02 groupwise-1-mht kernel: 
> (147,1):dlm_send_remote_unlock_request:356 ERROR: status = -107
>> Mar  8 04:15:03 groupwise-1-mht last message repeated 383 times
>> Mar  8 06:27:54 groupwise-1-mht kernel: 
> (147,1):dlm_send_remote_unlock_request:356 ERROR: status = -107
>> Mar  8 06:27:54 groupwise-1-mht last message repeated 127 times
>> Mar  8 06:27:54 groupwise-1-mht kernel: 
> (147,1):dlm_send_remote_unlock_request:356 ERROR: status = -107
>> Mar  8 06:27:54 groupwise-1-mht last message repeated 127 times
>> Mar  8 06:35:48 groupwise-1-mht kernel: (8872,0):dlm_do_master_request:1330 
> ERROR: link to 2 went down!
>> Mar  8 06:35:48 groupwise-1-mht kernel: (8872,0):dlm_get_lock_resource:914 
> ERROR: status = -107
>> Mar  8 06:52:45 groupwise-1-mht kernel: (8861,0):dlm_do_master_request:1330 
> ERROR: link to 2 went down!
>> Mar  8 06:52:45 groupwise-1-mht kernel: (8861,0):dlm_get_lock_resource:914 
> ERROR: status = -107
>> Mar  8 06:54:11 groupwise-1-mht kernel: (8854,3):ocfs2_broadcast_vote:725 
> ERROR: status = -107
>> Mar  8 06:54:11 groupwise-1-mht kernel: (8854,3):ocfs2_do_request_vote:798 
> ERROR: status = -107
>> Mar  8 06:54:11 groupwise-1-mht kernel: (8854,3):ocfs2_unlink:840 ERROR: 
> status = -107
>> Mar  8 06:54:18 groupwise-1-mht kernel: (8855,0):ocfs2_broadcast_vote:725 
> ERROR: status = -107
>> Mar  8 06:54:18 groupwise-1-mht kernel: (8855,0):ocfs2_do_request_vote:798 
> ERROR: status = -107
>> Mar  8 06:54:18 groupwise-1-mht kernel: (8855,0):ocfs2_unlink:840 ERROR: 
> status = -107
>> Mar  8 06:54:18 groupwise-1-mht kernel: (8855,0):ocfs2_broadcast_vote:725 
> ERROR: status = -107
>> Mar  8 06:54:18 groupwise-1-mht kernel: (8855,0):ocfs2_do_request_vote:798 
> ERROR: status = -107
>> Mar  8 06:54:18 groupwise-1-mht kernel: (8855,0):ocfs2_unlink:840 ERROR: 
> status = -107
>> Mar  8 06:54:58 groupwise-1-mht kernel: (8853,0):ocfs2_broadcast_vote:725 
> ERROR: status = -107
>> Mar  8 06:54:58 groupwise-1-mht kernel: (8853,0):ocfs2_do_request_vote:798 
> ERROR: status = -107
>> Mar  8 06:54:58 groupwise-1-mht kernel: (8853,0):ocfs2_unlink:840 ERROR: 
> status = -107
>> Mar  8 07:09:41 groupwise-1-mht kernel: (4192,0):dlm_do_master_request:1330 
> ERROR: link to 2 went down!
>> Mar  8 07:09:41 groupwise-1-mht kernel: (4192,0):dlm_get_lock_resource:914 
> ERROR: status = -107
>> Mar  8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_broadcast_vote:725 
> ERROR: status = -107
>> Mar  8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_do_request_vote:798 
> ERROR: status = -107
>> Mar  8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_unlink:840 ERROR: 
> status = -107
>> Mar  8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_broadcast_vote:725 
> ERROR: status = -107
>> Mar  8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_do_request_vote:798 
> ERROR: status = -107
>> Mar  8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_unlink:840 ERROR: 
> status = -107
>> Mar  8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_broadcast_vote:725 
> ERROR: status = -107
>> Mar  8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_do_request_vote:798 
> ERROR: status = -107
>> Mar  8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_unlink:840 ERROR: 
> status = -107
>> Mar  8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_broadcast_vote:725 
> ERROR: status = -107
>> Mar  8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_do_request_vote:798 
> ERROR: status = -107
>> Mar  8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_unlink:840 ERROR: 
> status = -107
>> Mar  8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_broadcast_vote:725 
> ERROR: status = -107
>> Mar  8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_do_request_vote:798 
> ERROR: status = -107
>> Mar  8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_unlink:840 ERROR: 
> status = -107
>> Mar  8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_broadcast_vote:725 
> ERROR: status = -107
>> Mar  8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_do_request_vote:798 
> ERROR: status = -107
>> Mar  8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_unlink:840 ERROR: 
> status = -107
>> Mar  8 07:16:13 groupwise-1-mht kernel: (4253,0):ocfs2_broadcast_vote:725 
> ERROR: status = -107
>> Mar  8 07:16:13 groupwise-1-mht kernel: (4253,0):ocfs2_do_request_vote:798 
> ERROR: status = -107
>> Mar  8 07:16:13 groupwise-1-mht kernel: (4253,0):ocfs2_unlink:840 ERROR: 
> status = -107
>> Mar  8 07:18:57 groupwise-1-mht kernel: (4341,0):dlm_do_master_request:1330 
> ERROR: link to 2 went down!
>> Mar  8 07:18:57 groupwise-1-mht kernel: (4341,0):dlm_get_lock_resource:914 
> ERROR: status = -107
>> Mar  8 07:19:24 groupwise-1-mht kernel: (4356,0):ocfs2_broadcast_vote:725 
> ERROR: status = -107
>> Mar  8 07:19:24 groupwise-1-mht kernel: (4356,0):ocfs2_do_request_vote:798 
> ERROR: status = -107 Mar  8 07:19:24 groupwise-1-mht kernel: 
> (4356,0):ocfs2_unlink:840 ERROR: status = -107
>> Mar  8 07:20:49 groupwise-1-mht sshd[4375]: Accepted publickey for root from 
> 10.1.31.27 port 1752 ssh2
>> Mar  8 07:20:50 groupwise-1-mht kernel: 
> (147,0):dlm_send_remote_unlock_request:356 ERROR: status = -107 Mar  8 
> 07:20:50 groupwise-1-mht last message repeated 255 times
>> Mar  8 07:20:53 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:20:53 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:20:58 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:20:58 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:21:03 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:21:03 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:21:08 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:21:08 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:21:13 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:21:13 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:21:19 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:21:19 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:21:24 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:21:24 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:21:29 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:21:29 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:21:34 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:21:34 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:21:39 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:21:39 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:21:44 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:21:44 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:21:49 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:21:49 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:21:54 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:21:54 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:21:59 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107 Mar  8 
> 07:21:59 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2 Mar  8 07:22:04 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:22:04 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:22:10 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:22:10 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:22:15 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:22:20 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:22:20 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:22:25 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:22:25 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:22:30 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:22:30 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:22:35 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:22:35 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:22:40 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:22:40 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:22:45 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:22:45 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:22:50 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:22:50 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:22:55 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:22:55 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:23:01 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:23:01 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:23:06 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:23:06 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:23:11 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:23:11 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:23:16 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:23:16 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:23:21 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:23:21 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:23:26 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:23:26 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:23:31 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:23:31 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:23:36 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar  8 07:23:36 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar  8 07:23:40 groupwise-1-mht kernel: (28613,2):dlm_get_lock_resource:847 
> B6ECAF5A668A4573AF763908F26958DB:$RECOVERY: at least one node (2) torecover 
> before lock mastery can begin
>> Mar  8 07:23:40 groupwise-1-mht kernel: (28613,2):dlm_get_lock_resource:874 
> B6ECAF5A668A4573AF763908F26958DB: recovery map is not empty, but must master 
> $RECOVERY lock now
>> Mar  8 07:23:41 groupwise-1-mht kernel: (4432,0):ocfs2_replay_journal:1176 
> Recovering node 2 from slot 1 on device (253,1)
>> Mar  8 07:23:41 groupwise-1-mht kernel: (4192,0):dlm_restart_lock_mastery:1214 
> ERROR: node down! 2
>> Mar  8 07:23:41 groupwise-1-mht kernel: 
> (4192,0):dlm_wait_for_lock_mastery:1035 ERROR: status = -11
>> Mar  8 07:23:41 groupwise-1-mht kernel: (929,1):dlm_restart_lock_mastery:1214 
> ERROR: node down! 2
>> Mar  8 07:23:41 groupwise-1-mht kernel: (929,1):dlm_wait_for_lock_mastery:1035 
> ERROR: status = -11
>> Mar  8 07:23:42 groupwise-1-mht kernel: (4341,1):dlm_restart_lock_mastery:1214 
> ERROR: node down! 2
>> Mar  8 07:23:42 groupwise-1-mht kernel: 
> (4341,1):dlm_wait_for_lock_mastery:1035 ERROR: status = -11
>> Mar  8 07:23:42 groupwise-1-mht kernel: (4341,1):dlm_restart_lock_mastery:1214 
> ERROR: node down! 2
>> Mar  8 07:23:42 groupwise-1-mht kernel: 
> (4341,1):dlm_wait_for_lock_mastery:1035 ERROR: status = -11
>> Mar  8 07:23:42 groupwise-1-mht kernel: (4192,0):dlm_get_lock_resource:895 
> 2062CE05ABA246988E9CCCDAE253F458:D000000000000000037872ff59e2a10: at least 
> one node (2) torecover before lock mastery can begin
>> Mar  8 07:23:42 groupwise-1-mht kernel: (499,1):dlm_restart_lock_mastery:1214 
> ERROR: node down! 2
>> Mar  8 07:23:42 groupwise-1-mht kernel: (499,1):dlm_wait_for_lock_mastery:1035 
> ERROR: status = -11
>> Mar  8 07:23:42 groupwise-1-mht kernel: (929,1):dlm_get_lock_resource:895 
> 2062CE05ABA246988E9CCCDAE253F458:M0000000000000002d2ab960a02ee32: at least 
> one node (2) torecover before lock mastery can begin
>> Mar  8 07:23:43 groupwise-1-mht kernel: (4341,1):dlm_get_lock_resource:895 
> 2062CE05ABA246988E9CCCDAE253F458:D00000000000000005ac8f593b44a80: at least 
> one node (2) torecover before lock mastery can begin
>> Mar  8 07:23:43 groupwise-1-mht kernel: (8872,1):dlm_restart_lock_mastery:1214 
> ERROR: node down! 2
>> Mar  8 07:23:43 groupwise-1-mht kernel: 
> (8872,1):dlm_wait_for_lock_mastery:1035 ERROR: status = -11
>> Mar  8 07:23:43 groupwise-1-mht kernel: (499,1):dlm_get_lock_resource:895 
> 2062CE05ABA246988E9CCCDAE253F458:D0000000000000000059e0c78635d25: at least 
> one node (2) torecover before lock mastery can begin
>> Mar  8 07:23:43 groupwise-1-mht kernel: (8223,2):ocfs2_dlm_eviction_cb:119 
> device (253,0): dlm has evicted node 2
>> Mar  8 07:23:43 groupwise-1-mht kernel: (4431,0):dlm_get_lock_resource:847 
> 2062CE05ABA246988E9CCCDAE253F458:M000000000000000000001de83f8b74: at least 
> one node (2) torecover before lock mastery can begin
>> Mar  8 07:23:44 groupwise-1-mht kernel: (8872,1):dlm_get_lock_resource:895 
> 2062CE05ABA246988E9CCCDAE253F458:D0000000000000000ce315c7764670d: at least 
> one node (2) torecover before lock mastery can begin
>> Mar  8 07:23:44 groupwise-1-mht kernel: (4431,0):dlm_get_lock_resource:895 
> 2062CE05ABA246988E9CCCDAE253F458:M000000000000000000001de83f8b74: at least 
> one node (2) torecover before lock mastery can begin
>> Mar  8 07:23:44 groupwise-1-mht kernel: (873,1):dlm_restart_lock_mastery:1214 
> ERROR: node down! 2
>> Mar  8 07:23:49 groupwise-1-mht kernel: (873,1):dlm_wait_for_lock_mastery:1035 
> ERROR: status = -11
>> Mar  8 07:23:49 groupwise-1-mht kernel: (901,1):dlm_restart_lock_mastery:1214 
> ERROR: node down! 2
>> Mar  8 07:23:49 groupwise-1-mht kernel: (901,1):dlm_wait_for_lock_mastery:1035 
> ERROR: status = -11
>> Mar  8 07:23:49 groupwise-1-mht kernel: (8861,1):dlm_restart_lock_mastery:1214 
> ERROR: node down! 2
>> Mar  8 07:23:49 groupwise-1-mht kernel: 
> (8861,1):dlm_wait_for_lock_mastery:1035 ERROR: status = -11
>> Mar  8 07:23:49 groupwise-1-mht kernel: (873,1):dlm_get_lock_resource:895 
> 2062CE05ABA246988E9CCCDAE253F458:M0000000000000002fc058c0a084a80: at least 
> one node (2) torecover before lock mastery can begin
>> Mar  8 07:23:49 groupwise-1-mht kernel: (901,1):dlm_get_lock_resource:895 
> 2062CE05ABA246988E9CCCDAE253F458:M0000000000000002ff18686a1b86f4: at least 
> one node (2) torecover before lock mastery can begin
>> Mar  8 07:23:49 groupwise-1-mht kernel: (8861,1):dlm_get_lock_resource:895 
> 2062CE05ABA246988E9CCCDAE253F458:D0000000000000000b2f76e77647700: at least 
> one node (2) torecover before lock mastery can begin
>> Mar  8 07:23:49 groupwise-1-mht kernel: kjournald starting.  Commit interval 5 
> seconds
>> Mar  8 07:23:49 groupwise-1-mht kernel: (4431,0):ocfs2_replay_journal:1176 
> Recovering node 2 from slot 1 on device (253,0)
>> Mar  8 07:23:55 groupwise-1-mht kernel: (fs/jbd/recovery.c, 255): 
> journal_recover: JBD: recovery, exit status 0, recovered transactions 599034 
> to 599035
>> Mar  8 07:23:55 groupwise-1-mht kernel: (fs/jbd/recovery.c, 257): 
> journal_recover: JBD: Replayed 8 and revoked 0/0 blocks
>> Mar  8 07:23:55 groupwise-1-mht kernel: kjournald starting.  Commit interval 5 
> seconds
>> Mar  8 07:25:51 groupwise-1-mht kernel: o2net: accepted connection from node 
> groupwise-2-mht (num 2) at 192.168.1.3:7777
>> Mar  8 07:25:55 groupwise-1-mht kernel: ocfs2_dlm: Node 2 joins domain 
> 2062CE05ABA246988E9CCCDAE253F458
>> Mar  8 07:25:55 groupwise-1-mht kernel: ocfs2_dlm: Nodes in domain 
> ("2062CE05ABA246988E9CCCDAE253F458"): 0 1 2
>> Mar  8 07:25:59 groupwise-1-mht kernel: ocfs2_dlm: Node 2 joins domain 
> B6ECAF5A668A4573AF763908F26958DB
>> Mar  8 07:25:59 groupwise-1-mht kernel: ocfs2_dlm: Nodes in domain 
> ("B6ECAF5A668A4573AF763908F26958DB"): 0 1 2
>>
>>
>>
>>
>> Andy Kipp
>> Network Administrator
>> Velcro USA Inc.
>> 406 Brown Ave. 
>> Manchester, NH 03103
>> Phone: (603) 222-4844
>> Email: akipp at velcro.com 
>>
>> CONFIDENTIALITY NOTICE:
>> This email is intended only for the person or entity to which it is 
> addressed and may contain confidential and/or privileged material. Any 
> unauthorized review, use, disclosure or distribution is prohibited. If you 
> are not the intended recipient, please contact the sender by reply e-mail and 
> destroy all copies of the original message. If you are the intended recipient 
> but do not wish to receive communications through this medium, please so 
> advise immediately.
>>
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com 
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users 
>>