[Ocfs2-users] ocfs2 cluster becomes unresponsive
Andy Kipp
AKIPP at velcro.com
Tue Mar 13 07:16:05 PDT 2007
I checked bugzilla and what is happening is almost identical to bug #819. However, the "dead" node continues to heartbeat, yet is unresponsive. No log output at all is generated on the "dead" node. This has been happening for a few months however frequency is increasing. Is there any information I can provide to hopefully figure this out?
- Andy
--
Andrew Kipp
Network Administrator
Velcro USA Inc.
Email: akipp at velcro.com
Work: (603) 222-4844
CONFIDENTIALITY NOTICE: This email is intended only for the person or entity to which it is addressed and may contain confidential and/or privileged material. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply e mail and destroy all copies of the original message. If you are the intended recipient but do not wish to receive communications through this medium, please so advise immediately.
>>> On 3/9/2007 at 9:39 PM, in message <45F21A7F.5090802 at oracle.com>, Sunil Mushran
<Sunil.Mushran at oracle.com> wrote:
> File a bugzilla with the messages from all three nodes. Appears
> node 2 went down but kept heartbeating. Strange. The messages
> from node 2 may shed more light.
>
> Andy Kipp wrote:
>> We are running OCFS2 on SLES9 machines using a FC SAN. Without warning both
> nodes will become unresponsive. Can not access either machine via ssh or
> terminal (hangs after typing in username). However the machine still responds
> to pings. This continues until one node is rebooted, at which time the second
> node resumes normal operations.
>>
>> I am not entirely sure that this is an OCFS2 problem at all however the
> syslog shows it had issues Here is the log from the node that was not
> rebooted. The node that was rebooted contained no log information. The system
> appeared to have gone down at about 3AM, until the node was rebooted at
> around 7:15.
>>
>> Mar 8 03:06:32 groupwise-1-mht kernel: o2net: connection to node
> groupwise-2-mht (num 2) at 192.168.1.3:7777 has been idle for 10 seconds,
> shutting it down.
>> Mar 8 03:06:32 groupwise-1-mht kernel: (0,2):o2net_idle_timer:1310 here are
> some times that might help debug the situation: (tmr 1173341182.367220 now
> 1173341192.367244 dr 1173341182.367213 adv
> 1173341182.367228:1173341182.367229 func (05ce6220:2)
> 1173341182.367221:1173341182.367224)
>> Mar 8 03:06:32 groupwise-1-mht kernel: o2net: no longer connected to node
> groupwise-2-mht (num 2) at 192.168.1.3:7777
>> Mar 8 03:06:32 groupwise-1-mht kernel: (499,0):dlm_do_master_request:1330
> ERROR: link to 2 went down!
>> Mar 8 03:06:32 groupwise-1-mht kernel: (499,0):dlm_get_lock_resource:914
> ERROR: status = -112
>> Mar 8 03:13:02 groupwise-1-mht kernel: (8476,0):dlm_send_proxy_ast_msg:458
> ERROR: status = -107
>> Mar 8 03:13:02 groupwise-1-mht kernel: (8476,0):dlm_flush_asts:607 ERROR:
> status = -107
>> Mar 8 03:19:54 groupwise-1-mht kernel:
> (147,1):dlm_send_remote_unlock_request:356 ERROR: status = -107
>> Mar 8 03:19:54 groupwise-1-mht last message repeated 127 times
>> Mar 8 03:19:55 groupwise-1-mht kernel: (873,0):dlm_do_master_request:1330
> ERROR: link to 2 went down!
>> Mar 8 03:19:55 groupwise-1-mht kernel: (873,0):dlm_get_lock_resource:914
> ERROR: status = -107
>> Mar 8 03:19:55 groupwise-1-mht kernel: (901,0):dlm_do_master_request:1330
> ERROR: link to 2 went down!
>> Mar 8 03:19:55 groupwise-1-mht kernel: (901,0):dlm_get_lock_resource:914
> ERROR: status = -107
>> Mar 8 03:19:56 groupwise-1-mht kernel: (929,0):dlm_do_master_request:1330
> ERROR: link to 2 went down!
>> Mar 8 03:19:56 groupwise-1-mht kernel: (929,0):dlm_get_lock_resource:914
> ERROR: status = -107
>> Mar 8 03:45:29 groupwise-1-mht -- MARK --
>> Mar 8 04:15:02 groupwise-1-mht kernel:
> (147,1):dlm_send_remote_unlock_request:356 ERROR: status = -107
>> Mar 8 04:15:03 groupwise-1-mht last message repeated 383 times
>> Mar 8 06:27:54 groupwise-1-mht kernel:
> (147,1):dlm_send_remote_unlock_request:356 ERROR: status = -107
>> Mar 8 06:27:54 groupwise-1-mht last message repeated 127 times
>> Mar 8 06:27:54 groupwise-1-mht kernel:
> (147,1):dlm_send_remote_unlock_request:356 ERROR: status = -107
>> Mar 8 06:27:54 groupwise-1-mht last message repeated 127 times
>> Mar 8 06:35:48 groupwise-1-mht kernel: (8872,0):dlm_do_master_request:1330
> ERROR: link to 2 went down!
>> Mar 8 06:35:48 groupwise-1-mht kernel: (8872,0):dlm_get_lock_resource:914
> ERROR: status = -107
>> Mar 8 06:52:45 groupwise-1-mht kernel: (8861,0):dlm_do_master_request:1330
> ERROR: link to 2 went down!
>> Mar 8 06:52:45 groupwise-1-mht kernel: (8861,0):dlm_get_lock_resource:914
> ERROR: status = -107
>> Mar 8 06:54:11 groupwise-1-mht kernel: (8854,3):ocfs2_broadcast_vote:725
> ERROR: status = -107
>> Mar 8 06:54:11 groupwise-1-mht kernel: (8854,3):ocfs2_do_request_vote:798
> ERROR: status = -107
>> Mar 8 06:54:11 groupwise-1-mht kernel: (8854,3):ocfs2_unlink:840 ERROR:
> status = -107
>> Mar 8 06:54:18 groupwise-1-mht kernel: (8855,0):ocfs2_broadcast_vote:725
> ERROR: status = -107
>> Mar 8 06:54:18 groupwise-1-mht kernel: (8855,0):ocfs2_do_request_vote:798
> ERROR: status = -107
>> Mar 8 06:54:18 groupwise-1-mht kernel: (8855,0):ocfs2_unlink:840 ERROR:
> status = -107
>> Mar 8 06:54:18 groupwise-1-mht kernel: (8855,0):ocfs2_broadcast_vote:725
> ERROR: status = -107
>> Mar 8 06:54:18 groupwise-1-mht kernel: (8855,0):ocfs2_do_request_vote:798
> ERROR: status = -107
>> Mar 8 06:54:18 groupwise-1-mht kernel: (8855,0):ocfs2_unlink:840 ERROR:
> status = -107
>> Mar 8 06:54:58 groupwise-1-mht kernel: (8853,0):ocfs2_broadcast_vote:725
> ERROR: status = -107
>> Mar 8 06:54:58 groupwise-1-mht kernel: (8853,0):ocfs2_do_request_vote:798
> ERROR: status = -107
>> Mar 8 06:54:58 groupwise-1-mht kernel: (8853,0):ocfs2_unlink:840 ERROR:
> status = -107
>> Mar 8 07:09:41 groupwise-1-mht kernel: (4192,0):dlm_do_master_request:1330
> ERROR: link to 2 went down!
>> Mar 8 07:09:41 groupwise-1-mht kernel: (4192,0):dlm_get_lock_resource:914
> ERROR: status = -107
>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_broadcast_vote:725
> ERROR: status = -107
>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_do_request_vote:798
> ERROR: status = -107
>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_unlink:840 ERROR:
> status = -107
>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_broadcast_vote:725
> ERROR: status = -107
>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_do_request_vote:798
> ERROR: status = -107
>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_unlink:840 ERROR:
> status = -107
>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_broadcast_vote:725
> ERROR: status = -107
>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_do_request_vote:798
> ERROR: status = -107
>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_unlink:840 ERROR:
> status = -107
>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_broadcast_vote:725
> ERROR: status = -107
>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_do_request_vote:798
> ERROR: status = -107
>> Mar 8 07:14:09 groupwise-1-mht kernel: (4236,0):ocfs2_unlink:840 ERROR:
> status = -107
>> Mar 8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_broadcast_vote:725
> ERROR: status = -107
>> Mar 8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_do_request_vote:798
> ERROR: status = -107
>> Mar 8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_unlink:840 ERROR:
> status = -107
>> Mar 8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_broadcast_vote:725
> ERROR: status = -107
>> Mar 8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_do_request_vote:798
> ERROR: status = -107
>> Mar 8 07:15:50 groupwise-1-mht kernel: (4289,0):ocfs2_unlink:840 ERROR:
> status = -107
>> Mar 8 07:16:13 groupwise-1-mht kernel: (4253,0):ocfs2_broadcast_vote:725
> ERROR: status = -107
>> Mar 8 07:16:13 groupwise-1-mht kernel: (4253,0):ocfs2_do_request_vote:798
> ERROR: status = -107
>> Mar 8 07:16:13 groupwise-1-mht kernel: (4253,0):ocfs2_unlink:840 ERROR:
> status = -107
>> Mar 8 07:18:57 groupwise-1-mht kernel: (4341,0):dlm_do_master_request:1330
> ERROR: link to 2 went down!
>> Mar 8 07:18:57 groupwise-1-mht kernel: (4341,0):dlm_get_lock_resource:914
> ERROR: status = -107
>> Mar 8 07:19:24 groupwise-1-mht kernel: (4356,0):ocfs2_broadcast_vote:725
> ERROR: status = -107
>> Mar 8 07:19:24 groupwise-1-mht kernel: (4356,0):ocfs2_do_request_vote:798
> ERROR: status = -107 Mar 8 07:19:24 groupwise-1-mht kernel:
> (4356,0):ocfs2_unlink:840 ERROR: status = -107
>> Mar 8 07:20:49 groupwise-1-mht sshd[4375]: Accepted publickey for root from
> 10.1.31.27 port 1752 ssh2
>> Mar 8 07:20:50 groupwise-1-mht kernel:
> (147,0):dlm_send_remote_unlock_request:356 ERROR: status = -107 Mar 8
> 07:20:50 groupwise-1-mht last message repeated 255 times
>> Mar 8 07:20:53 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:20:53 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:20:58 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:20:58 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:21:03 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:21:03 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:21:08 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:21:08 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:21:13 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:21:13 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:21:19 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:21:19 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:21:24 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:21:24 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:21:29 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:21:29 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:21:34 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:21:34 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:21:39 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:21:39 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:21:44 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:21:44 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:21:49 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:21:49 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:21:54 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:21:54 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:21:59 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107 Mar 8
> 07:21:59 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2 Mar 8 07:22:04 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:22:04 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:22:10 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:22:10 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:22:15 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:22:20 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:22:20 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:22:25 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:22:25 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:22:30 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:22:30 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:22:35 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:22:35 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:22:40 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:22:40 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:22:45 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:22:45 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:22:50 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:22:50 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:22:55 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:22:55 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:23:01 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:23:01 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:23:06 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:23:06 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:23:11 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:23:11 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:23:16 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:23:16 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:23:21 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:23:21 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:23:26 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:23:26 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:23:31 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:23:31 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:23:36 groupwise-1-mht kernel:
> (4377,0):dlm_send_remote_convert_request:398 ERROR: status = -107
>> Mar 8 07:23:36 groupwise-1-mht kernel: (4377,0):dlm_wait_for_node_death:371
> 2062CE05ABA246988E9CCCDAE253F458: waiting 5000ms for notification of death of
> node 2
>> Mar 8 07:23:40 groupwise-1-mht kernel: (28613,2):dlm_get_lock_resource:847
> B6ECAF5A668A4573AF763908F26958DB:$RECOVERY: at least one node (2) torecover
> before lock mastery can begin
>> Mar 8 07:23:40 groupwise-1-mht kernel: (28613,2):dlm_get_lock_resource:874
> B6ECAF5A668A4573AF763908F26958DB: recovery map is not empty, but must master
> $RECOVERY lock now
>> Mar 8 07:23:41 groupwise-1-mht kernel: (4432,0):ocfs2_replay_journal:1176
> Recovering node 2 from slot 1 on device (253,1)
>> Mar 8 07:23:41 groupwise-1-mht kernel: (4192,0):dlm_restart_lock_mastery:1214
> ERROR: node down! 2
>> Mar 8 07:23:41 groupwise-1-mht kernel:
> (4192,0):dlm_wait_for_lock_mastery:1035 ERROR: status = -11
>> Mar 8 07:23:41 groupwise-1-mht kernel: (929,1):dlm_restart_lock_mastery:1214
> ERROR: node down! 2
>> Mar 8 07:23:41 groupwise-1-mht kernel: (929,1):dlm_wait_for_lock_mastery:1035
> ERROR: status = -11
>> Mar 8 07:23:42 groupwise-1-mht kernel: (4341,1):dlm_restart_lock_mastery:1214
> ERROR: node down! 2
>> Mar 8 07:23:42 groupwise-1-mht kernel:
> (4341,1):dlm_wait_for_lock_mastery:1035 ERROR: status = -11
>> Mar 8 07:23:42 groupwise-1-mht kernel: (4341,1):dlm_restart_lock_mastery:1214
> ERROR: node down! 2
>> Mar 8 07:23:42 groupwise-1-mht kernel:
> (4341,1):dlm_wait_for_lock_mastery:1035 ERROR: status = -11
>> Mar 8 07:23:42 groupwise-1-mht kernel: (4192,0):dlm_get_lock_resource:895
> 2062CE05ABA246988E9CCCDAE253F458:D000000000000000037872ff59e2a10: at least
> one node (2) torecover before lock mastery can begin
>> Mar 8 07:23:42 groupwise-1-mht kernel: (499,1):dlm_restart_lock_mastery:1214
> ERROR: node down! 2
>> Mar 8 07:23:42 groupwise-1-mht kernel: (499,1):dlm_wait_for_lock_mastery:1035
> ERROR: status = -11
>> Mar 8 07:23:42 groupwise-1-mht kernel: (929,1):dlm_get_lock_resource:895
> 2062CE05ABA246988E9CCCDAE253F458:M0000000000000002d2ab960a02ee32: at least
> one node (2) torecover before lock mastery can begin
>> Mar 8 07:23:43 groupwise-1-mht kernel: (4341,1):dlm_get_lock_resource:895
> 2062CE05ABA246988E9CCCDAE253F458:D00000000000000005ac8f593b44a80: at least
> one node (2) torecover before lock mastery can begin
>> Mar 8 07:23:43 groupwise-1-mht kernel: (8872,1):dlm_restart_lock_mastery:1214
> ERROR: node down! 2
>> Mar 8 07:23:43 groupwise-1-mht kernel:
> (8872,1):dlm_wait_for_lock_mastery:1035 ERROR: status = -11
>> Mar 8 07:23:43 groupwise-1-mht kernel: (499,1):dlm_get_lock_resource:895
> 2062CE05ABA246988E9CCCDAE253F458:D0000000000000000059e0c78635d25: at least
> one node (2) torecover before lock mastery can begin
>> Mar 8 07:23:43 groupwise-1-mht kernel: (8223,2):ocfs2_dlm_eviction_cb:119
> device (253,0): dlm has evicted node 2
>> Mar 8 07:23:43 groupwise-1-mht kernel: (4431,0):dlm_get_lock_resource:847
> 2062CE05ABA246988E9CCCDAE253F458:M000000000000000000001de83f8b74: at least
> one node (2) torecover before lock mastery can begin
>> Mar 8 07:23:44 groupwise-1-mht kernel: (8872,1):dlm_get_lock_resource:895
> 2062CE05ABA246988E9CCCDAE253F458:D0000000000000000ce315c7764670d: at least
> one node (2) torecover before lock mastery can begin
>> Mar 8 07:23:44 groupwise-1-mht kernel: (4431,0):dlm_get_lock_resource:895
> 2062CE05ABA246988E9CCCDAE253F458:M000000000000000000001de83f8b74: at least
> one node (2) torecover before lock mastery can begin
>> Mar 8 07:23:44 groupwise-1-mht kernel: (873,1):dlm_restart_lock_mastery:1214
> ERROR: node down! 2
>> Mar 8 07:23:49 groupwise-1-mht kernel: (873,1):dlm_wait_for_lock_mastery:1035
> ERROR: status = -11
>> Mar 8 07:23:49 groupwise-1-mht kernel: (901,1):dlm_restart_lock_mastery:1214
> ERROR: node down! 2
>> Mar 8 07:23:49 groupwise-1-mht kernel: (901,1):dlm_wait_for_lock_mastery:1035
> ERROR: status = -11
>> Mar 8 07:23:49 groupwise-1-mht kernel: (8861,1):dlm_restart_lock_mastery:1214
> ERROR: node down! 2
>> Mar 8 07:23:49 groupwise-1-mht kernel:
> (8861,1):dlm_wait_for_lock_mastery:1035 ERROR: status = -11
>> Mar 8 07:23:49 groupwise-1-mht kernel: (873,1):dlm_get_lock_resource:895
> 2062CE05ABA246988E9CCCDAE253F458:M0000000000000002fc058c0a084a80: at least
> one node (2) torecover before lock mastery can begin
>> Mar 8 07:23:49 groupwise-1-mht kernel: (901,1):dlm_get_lock_resource:895
> 2062CE05ABA246988E9CCCDAE253F458:M0000000000000002ff18686a1b86f4: at least
> one node (2) torecover before lock mastery can begin
>> Mar 8 07:23:49 groupwise-1-mht kernel: (8861,1):dlm_get_lock_resource:895
> 2062CE05ABA246988E9CCCDAE253F458:D0000000000000000b2f76e77647700: at least
> one node (2) torecover before lock mastery can begin
>> Mar 8 07:23:49 groupwise-1-mht kernel: kjournald starting. Commit interval 5
> seconds
>> Mar 8 07:23:49 groupwise-1-mht kernel: (4431,0):ocfs2_replay_journal:1176
> Recovering node 2 from slot 1 on device (253,0)
>> Mar 8 07:23:55 groupwise-1-mht kernel: (fs/jbd/recovery.c, 255):
> journal_recover: JBD: recovery, exit status 0, recovered transactions 599034
> to 599035
>> Mar 8 07:23:55 groupwise-1-mht kernel: (fs/jbd/recovery.c, 257):
> journal_recover: JBD: Replayed 8 and revoked 0/0 blocks
>> Mar 8 07:23:55 groupwise-1-mht kernel: kjournald starting. Commit interval 5
> seconds
>> Mar 8 07:25:51 groupwise-1-mht kernel: o2net: accepted connection from node
> groupwise-2-mht (num 2) at 192.168.1.3:7777
>> Mar 8 07:25:55 groupwise-1-mht kernel: ocfs2_dlm: Node 2 joins domain
> 2062CE05ABA246988E9CCCDAE253F458
>> Mar 8 07:25:55 groupwise-1-mht kernel: ocfs2_dlm: Nodes in domain
> ("2062CE05ABA246988E9CCCDAE253F458"): 0 1 2
>> Mar 8 07:25:59 groupwise-1-mht kernel: ocfs2_dlm: Node 2 joins domain
> B6ECAF5A668A4573AF763908F26958DB
>> Mar 8 07:25:59 groupwise-1-mht kernel: ocfs2_dlm: Nodes in domain
> ("B6ECAF5A668A4573AF763908F26958DB"): 0 1 2
>>
>>
>>
>>
>> Andy Kipp
>> Network Administrator
>> Velcro USA Inc.
>> 406 Brown Ave.
>> Manchester, NH 03103
>> Phone: (603) 222-4844
>> Email: akipp at velcro.com
>>
>> CONFIDENTIALITY NOTICE:
>> This email is intended only for the person or entity to which it is
> addressed and may contain confidential and/or privileged material. Any
> unauthorized review, use, disclosure or distribution is prohibited. If you
> are not the intended recipient, please contact the sender by reply e-mail and
> destroy all copies of the original message. If you are the intended recipient
> but do not wish to receive communications through this medium, please so
> advise immediately.
>>
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
More information about the Ocfs2-users
mailing list