[Ocfs2-users] ocfs2 cluster becomes unresponsive

Luis Freitas lfreitas34 at yahoo.com
Tue Mar 13 16:29:31 PDT 2007


Andy,
   
     I found helpfull to diagnose this kind of hang to keep a priority 0 shell opened on the server. This shell usually keeps working even during heavy swapping or other situations where the system becomes unresponsive. You can start one with this command:
   
  nice -n -20 bash
   
      From this you could run a top or a vmstat to see what is happening when the server is unresponsive. Just be careful to not run any command that might generate a large output or have high CPU usage, as you might hang the server yourself.
   
  Regards,
  Luis

Andy Kipp <AKIPP at velcro.com> wrote:
  I checked bugzilla and what is happening is almost identical to bug #819. However, the "dead" node continues to heartbeat, yet is unresponsive. No log output at all is generated on the "dead" node. This has been happening for a few months however frequency is increasing. Is there any information I can provide to hopefully figure this out?

- Andy
-- 

Andrew Kipp
Network Administrator
Velcro USA Inc.
Email: akipp at velcro.com
Work: (603) 222-4844

CONFIDENTIALITY NOTICE: This email is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e mail and destroy all copies of the original message. If you are the intended recipient but do not wish to receive communications through this medium, please so advise immediately.


>>> On 3/9/2007 at 9:39 PM, in message <45F21A7F.5090802 at oracle.com>, Sunil Mushran
wrote:
> File a bugzilla with the messages from all three nodes. Appears
> node 2 went down but kept heartbeating. Strange. The messages
> from node 2 may shed more light.
> 
> Andy Kipp wrote:
>> We are running OCFS2 on SLES9 machines using a FC SAN. Without warning both 
> nodes will become unresponsive. Can not access either machine via ssh or 
> terminal (hangs after typing in username). However the machine still responds 
> to pings. This continues until one node is rebooted, at which time the second 
> node resumes normal operations. 
>>
>> I am not entirely sure that this is an OCFS2 problem at all however the 
> syslog shows it had issues Here is the log from the node that was not 
> rebooted. The node that was rebooted contained no log information. The system 
> appeared to have gone down at about 3AM, until the node was rebooted at 
> around 7:15.
>>
>> Mar 8 03:06:32 groupwise-1-mht kernel: o2net: connection to node 
> groupwise-2-mht (num 2) at 192.168.1.3:7777 has been idle for 10 seconds, 
> shutting it down.
>> Mar 8 03:06:32 groupwise-1-mht kernel: (0,2):o2net_idle_timer:1310 here are 
> some times that might help debug the situation: (tmr 1173341182.367220 now 
> 1173341192.367244 dr 1173341182.367213 adv 
> 1173341182.367228:1173341182.367229 func (05ce6220:2) 
> 1173341182.367221:1173341182.367224)
>> Mar 8 03:06:32 groupwise-1-mht kernel: o2net: no longer connected to node 
> groupwise-2-mht (num 2) at 192.168.1.3:7777
>> Mar 8 03:06:32 groupwise-1-mht kernel: (499,0):dlm_do_master_request:1330 
> ERROR: link to 2 went down!
>> Mar 8 03:06:32 groupwise-1-mht kernel: (499,0):dlm_get_lock_resource:914 
> ERROR: status = -112
>> Mar 8 03:13:02 groupwise-1-mht kernel: (8476,0):dlm_send_proxy_ast_msg:458 
> ERROR: status = -107
>> Mar 8 03:13:02 groupwise-1-mht kernel: (8476,0):dlm_flush_asts:607 ERROR: 
> status = -107
>> Mar 8 03:19:54 groupwise-1-mht kernel: 
> (147,1):dlm_send_remote_unlock_request:356 ERROR: status = -107
>> Mar 8 03:19:54 groupwise-1-mht last message repeated 127 times
>> Mar 8 03:19:55 groupwise-1-mht kernel: (873,0):dlm_do_master_request:1330 
> ERROR: link to 2 went down!
>> Mar 8 03:19:55 groupwise-1-mht kernel: (873,0):dlm_get_lock_resource:914 
> ERROR: status = -107
>> Mar 8 03:19:55 groupwise-1-mht kernel: (901,0):dlm_do_master_request:1330 
> ERROR: link to 2 went down!
>> Mar 8 03:19:55 groupwise-1-mht kernel: (901,0):dlm_get_lock_resource:914 
> ERROR: status = -107
>> Mar 8 03:19:56 groupwise-1-mht kernel: (929,0):dlm_do_master_request:1330 
> ERROR: link to 2 went down!
>> Mar 8 03:19:56 groupwise-1-mht kernel: (929,0):dlm_get_lock_resource:914 
> ERROR: status = -107
>> Mar 8 03:45:29 groupwise-1-mht -- MARK --
>> Mar 8 04:15:02 groupwise-1-mht kernel: 
> (147,1):dlm_send_remote_unlock_request:356 ERROR: status = -107
>> Mar 8 04:15:03 groupwise-1-mht last message repeated 383 times
>> Mar 8 06:27:54 groupwise-1-mht kernel: 
> (147,1):dlm_send_remote_unlock_request:356 ERROR: status = -107
>> Mar 8 06:27:54 groupwise-1-mht last message repeated 127 times
>> Mar 8 06:27:54 groupwise-1-mht kernel: 
> (147,1):dlm_send_remote_unlock_request:356 ERROR: status = -107
>> Mar 8 06:27:54 groupwise-1-mht last message repeated 127 times
>> Mar 8 06:35:48 groupwise-1-mht kernel: (8872,0):dlm_do_master_request:1330 
> ERROR: link to 2 went down!
>> Mar 8 06:35:48 groupwise-1-mht kernel: (8872,0):dlm_get_lock_resource:914 
> ERROR: status = -107
>> Mar 8 06:52:45 groupwise-1-mht kernel: (8861,0):dlm_do_master_request:1330 
> ERROR: link to 2 went down!
>> Mar 8 06:52:45 groupwise-1-mht kernel: (8861,0):dlm_get_lock_resource:914 
> ERROR: status = -107
>> Mar 8 06:54:11 groupwise-1-mht kernel: (8854,3):ocfs2_broadcast_vote:725 
> ERROR: status = -107
>> Mar 8 06:54:11 groupwise-1-mht kernel: (8854,3):ocfs2_do_request_vote:798 
> ERROR: status = -107
>> Mar 8 06:54:11 groupwise-1-mht kernel: (8854,3):ocfs2_unlink:840 ERROR: 
> status = -107
>> Mar 8 06:54:18 groupwise-1-mht kernel: (8855,0):ocfs2_broadcast_vote:725 
> ERROR: status = -107
>> Mar 8 06:54:18 groupwise-1-mht kernel: (8855,0):ocfs2_do_request_vote:798 
> ERROR: status = -107
>> Mar 8 06:54:18 groupwise-1-mht kernel: (8855,0):ocfs2_unlink:840 ERROR: 
> status = -107
>> Mar 8 06:54:18 groupwise-1-mht kernel: (8855,0):ocfs2_broadcast_vote:725 
> ERROR: status = -107
>> Mar 8 06:54:18 groupwise-1-mht kernel: (8855,0):ocfs2_do_request_vote:798 
> ERROR: status = -107
>> Mar 8 06:54:18 groupwise-1-mht kernel: (8855,0):ocfs2_unlink:840 ERROR: 
> status = -107
>> Mar 8 06:54:58 groupwise-1-mht kernel: (8853,0):ocfs2_broadcast_vote:725 
> ERROR: status = -107
>> Mar 8 06:54:58 groupwise-1-mht kernel: (8853,0):ocfs2_do_request_vote:798 
> ERROR: status = -107
>> Mar 8 06:54:58 groupwise-1-mht kernel: (8853,0):ocfs2_unlink:840 ERROR: 
> status = -107
>> Mar 8 07:09:41 groupwise-1-mht kernel: (4192,0):dlm_do_master_request:1330 
> ERROR: link to 2 went down!
>> Mar 8 07:09:41 groupwise-1-mht kernel: (4192,0):dlm_get_lock_resource:914 
> ERROR: status = -107
>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_broadcast_vote:725 
> ERROR: status = -107
>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_do_request_vote:798 
> ERROR: status = -107
>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_unlink:840 ERROR: 
> status = -107
>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_broadcast_vote:725 
> ERROR: status = -107
>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_do_request_vote:798 
> ERROR: status = -107
>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_unlink:840 ERROR: 
> status = -107
>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_broadcast_vote:725 
> ERROR: status = -107
>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_do_request_vote:798 
> ERROR: status = -107
>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_unlink:840 ERROR: 
> status = -107
>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_broadcast_vote:725 
> ERROR: status = -107
>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_do_request_vote:798 
> ERROR: status = -107
>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_unlink:840 ERROR: 
> status = -107
>> Mar 8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_broadcast_vote:725 
> ERROR: status = -107
>> Mar 8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_do_request_vote:798 
> ERROR: status = -107
>> Mar 8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_unlink:840 ERROR: 
> status = -107
>> Mar 8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_broadcast_vote:725 
> ERROR: status = -107
>> Mar 8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_do_request_vote:798 
> ERROR: status = -107
>> Mar 8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_unlink:840 ERROR: 
> status = -107
>> Mar 8 07:16:13 groupwise-1-mht kernel: (4253,0):ocfs2_broadcast_vote:725 
> ERROR: status = -107
>> Mar 8 07:16:13 groupwise-1-mht kernel: (4253,0):ocfs2_do_request_vote:798 
> ERROR: status = -107
>> Mar 8 07:16:13 groupwise-1-mht kernel: (4253,0):ocfs2_unlink:840 ERROR: 
> status = -107
>> Mar 8 07:18:57 groupwise-1-mht kernel: (4341,0):dlm_do_master_request:1330 
> ERROR: link to 2 went down!
>> Mar 8 07:18:57 groupwise-1-mht kernel: (4341,0):dlm_get_lock_resource:914 
> ERROR: status = -107
>> Mar 8 07:19:24 groupwise-1-mht kernel: (4356,0):ocfs2_broadcast_vote:725 
> ERROR: status = -107
>> Mar 8 07:19:24 groupwise-1-mht kernel: (4356,0):ocfs2_do_request_vote:798 
> ERROR: status = -107 Mar 8 07:19:24 groupwise-1-mht kernel: 
> (4356,0):ocfs2_unlink:840 ERROR: status = -107
>> Mar 8 07:20:49 groupwise-1-mht sshd[4375]: Accepted publickey for root from 
> 10.1.31.27 port 1752 ssh2
>> Mar 8 07:20:50 groupwise-1-mht kernel: 
> (147,0):dlm_send_remote_unlock_request:356 ERROR: status = -107 Mar 8 
> 07:20:50 groupwise-1-mht last message repeated 255 times
>> Mar 8 07:20:53 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:20:53 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:20:58 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:20:58 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:21:03 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:21:03 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:21:08 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:21:08 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:21:13 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:21:13 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:21:19 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:21:19 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:21:24 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:21:24 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:21:29 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:21:29 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:21:34 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:21:34 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:21:39 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:21:39 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:21:44 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:21:44 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:21:49 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:21:49 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:21:54 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:21:54 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:21:59 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107 Mar 8 
> 07:21:59 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2 Mar 8 07:22:04 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:22:04 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:22:10 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:22:10 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:22:15 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:22:20 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:22:20 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:22:25 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:22:25 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:22:30 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:22:30 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:22:35 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:22:35 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:22:40 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:22:40 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:22:45 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:22:45 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:22:50 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:22:50 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:22:55 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:22:55 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:23:01 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:23:01 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:23:06 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:23:06 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:23:11 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:23:11 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:23:16 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:23:16 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:23:21 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:23:21 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:23:26 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:23:26 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:23:31 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:23:31 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:23:36 groupwise-1-mht kernel: 
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:23:36 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371 
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of 
> node 2
>> Mar 8 07:23:40 groupwise-1-mht kernel: (28613,2):dlm_get_lock_resource:847 
> B6ECAF5A668A4573AF763908F26958DB:$RECOVERY: at least one node (2) torecover 
> before lock mastery can begin
>> Mar 8 07:23:40 groupwise-1-mht kernel: (28613,2):dlm_get_lock_resource:874 
> B6ECAF5A668A4573AF763908F26958DB: recovery map is not empty, but must master 
> $RECOVERY lock now
>> Mar 8 07:23:41 groupwise-1-mht kernel: (4432,0):ocfs2_replay_journal:1176 
> Recovering node 2 from slot 1 on device (253,1)
>> Mar 8 07:23:41 groupwise-1-mht kernel: (4192,0):dlm_restart_lock_mastery:1214 
> ERROR: node down! 2
>> Mar 8 07:23:41 groupwise-1-mht kernel: 
> (4192,0):dlm_wait_for_lock_mastery:1035 ERROR: status = -11
>> Mar 8 07:23:41 groupwise-1-mht kernel: (929,1):dlm_restart_lock_mastery:1214 
> ERROR: node down! 2
>> Mar 8 07:23:41 groupwise-1-mht kernel: (929,1):dlm_wait_for_lock_mastery:1035 
> ERROR: status = -11
>> Mar 8 07:23:42 groupwise-1-mht kernel: (4341,1):dlm_restart_lock_mastery:1214 
> ERROR: node down! 2
>> Mar 8 07:23:42 groupwise-1-mht kernel: 
> (4341,1):dlm_wait_for_lock_mastery:1035 ERROR: status = -11
>> Mar 8 07:23:42 groupwise-1-mht kernel: (4341,1):dlm_restart_lock_mastery:1214 
> ERROR: node down! 2
>> Mar 8 07:23:42 groupwise-1-mht kernel: 
> (4341,1):dlm_wait_for_lock_mastery:1035 ERROR: status = -11
>> Mar 8 07:23:42 groupwise-1-mht kernel: (4192,0):dlm_get_lock_resource:895 
> 2062CE05ABA246988E9CCCDAE253F458:D000000000000000037872ff59e2a10: at least 
> one node (2) torecover before lock mastery can begin
>> Mar 8 07:23:42 groupwise-1-mht kernel: (499,1):dlm_restart_lock_mastery:1214 
> ERROR: node down! 2
>> Mar 8 07:23:42 groupwise-1-mht kernel: (499,1):dlm_wait_for_lock_mastery:1035 
> ERROR: status = -11
>> Mar 8 07:23:42 groupwise-1-mht kernel: (929,1):dlm_get_lock_resource:895 
> 2062CE05ABA246988E9CCCDAE253F458:M0000000000000002d2ab960a02ee32: at least 
> one node (2) torecover before lock mastery can begin
>> Mar 8 07:23:43 groupwise-1-mht kernel: (4341,1):dlm_get_lock_resource:895 

=== message truncated ===

 
---------------------------------
Bored stiff? Loosen up...
Download and play hundreds of games for free on Yahoo! Games.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20070313/516051b6/attachment-0001.html


More information about the Ocfs2-users mailing list