[Ocfs2-users] "node down!" are related with SVN rev 3004?

Marcus Alves Grando marcus.grando at terra.com.br
Wed May 23 14:42:42 PDT 2007


Hi list,

Today i have a problem with ocfs2, one server stop access to ocfs2 disks 
and the only message in /var/log/messages are:

May 23 16:24:26 node3 kernel: (6956,3):dlm_restart_lock_mastery:1301 
ERROR: node down! 1
May 23 16:24:26 node3 kernel: (6956,3):dlm_wait_for_lock_mastery:1118 
ERROR: status = -11

I don't know what's happened. Maybe that's related with 3004 fix? 
Someone already see that?

Another strange fact are all nodes mount 13 SAN disks and "leaves" 
messages occurrs only nine times.

Another fact are node1 is down to maintanance since 08:30.

Others servers have this messages:

**** node2

May 23 16:48:31 node2 kernel: ocfs2_dlm: Node 3 leaves domain 
84407FC4A92E451DADEF260A2FE0E366
May 23 16:48:31 node2 kernel: ocfs2_dlm: Nodes in domain 
("84407FC4A92E451DADEF260A2FE0E366"): 2 4
May 23 16:48:37 node2 kernel: ocfs2_dlm: Node 3 leaves domain 
1FB62EB34D1F495A9F11F396E707588C
May 23 16:48:37 node2 kernel: ocfs2_dlm: Nodes in domain 
("1FB62EB34D1F495A9F11F396E707588C"): 2 4
May 23 16:48:42 node2 kernel: ocfs2_dlm: Node 3 leaves domain 
793ACD36E8CA4067AB99F9F4F2229634
May 23 16:48:42 node2 kernel: ocfs2_dlm: Nodes in domain 
("793ACD36E8CA4067AB99F9F4F2229634"): 2 4
May 23 16:48:48 node2 kernel: ocfs2_dlm: Node 3 leaves domain 
ECECF9980CBD44EFA7E8A950EDE40573
May 23 16:48:48 node2 kernel: ocfs2_dlm: Nodes in domain 
("ECECF9980CBD44EFA7E8A950EDE40573"): 2 4
May 23 16:48:53 node2 kernel: ocfs2_dlm: Node 3 leaves domain 
D8AFCBD0CF59404991FAB19916CEE08B
May 23 16:48:53 node2 kernel: ocfs2_dlm: Nodes in domain 
("D8AFCBD0CF59404991FAB19916CEE08B"): 2 4
May 23 16:48:58 node2 kernel: ocfs2_dlm: Node 3 leaves domain 
E8B0A018151943A28674662818529F0F
May 23 16:48:58 node2 kernel: ocfs2_dlm: Nodes in domain 
("E8B0A018151943A28674662818529F0F"): 2 4
May 23 16:49:03 node2 kernel: ocfs2_dlm: Node 3 leaves domain 
3D227E224D0D4D9F97B84B0BB7DE7E22
May 23 16:49:03 node2 kernel: ocfs2_dlm: Nodes in domain 
("3D227E224D0D4D9F97B84B0BB7DE7E22"): 2 4
May 23 16:49:09 node2 kernel: ocfs2_dlm: Node 3 leaves domain 
C40090C8D14D48C9AC0D1024A228EC59
May 23 16:49:09 node2 kernel: ocfs2_dlm: Nodes in domain 
("C40090C8D14D48C9AC0D1024A228EC59"): 2 4
May 23 16:49:14 node2 kernel: ocfs2_dlm: Node 3 leaves domain 
9CB224941DC64A39872A5012FBD12354
May 23 16:49:14 node2 kernel: ocfs2_dlm: Nodes in domain 
("9CB224941DC64A39872A5012FBD12354"): 2 4
May 23 16:50:04 node2 kernel: o2net: connection to node node3.hst.host 
(num 3) at 192.168.0.3:7777 has been idle for 30.0 seconds, shutting it 
down.
May 23 16:50:04 node2 kernel: (0,3):o2net_idle_timer:1418 here are some 
times that might help debug the situation: (tmr 1179949774.944117 now 
1179949804.944585 dr 1179949774.944109 adv 
1179949774.944119:1179949774.944121 func (d21ddb4d:513) 
1179949754.944260:1179949754.944271)
May 23 16:50:04 node2 kernel: o2net: no longer connected to node 
node3.hst.host (num 3) at 192.168.0.3:7777
May 23 16:52:31 node2 kernel: 
(23351,2):dlm_send_remote_convert_request:398 ERROR: status = -107
May 23 16:52:31 node2 kernel: (23351,2):dlm_wait_for_node_death:365 
BD2D6C1943FB4771B018EA2A7D056E8A: waiting 5000ms for notification of 
death of node 3
May 23 16:52:32 node2 kernel: (4379,3):ocfs2_dlm_eviction_cb:119 device 
(8,49): dlm has evicted node 3
May 23 16:52:32 node2 kernel: (4451,1):dlm_get_lock_resource:921 
BD2D6C1943FB4771B018EA2A7D056E8A:$RECOVERY: at least one node (3) 
torecover before lock mastery can begin
May 23 16:52:32 node2 kernel: (4451,1):dlm_get_lock_resource:955 
BD2D6C1943FB4771B018EA2A7D056E8A: recovery map is not empty, but must 
master $RECOVERY lock now
May 23 16:52:33 node2 kernel: (4441,2):dlm_get_lock_resource:921 
36D7DEC36FC44C53A6107B6A9CE863A2:$RECOVERY: at least one node (3) 
torecover before lock mastery can begin
May 23 16:52:33 node2 kernel: (4441,2):dlm_get_lock_resource:955 
36D7DEC36FC44C53A6107B6A9CE863A2: recovery map is not empty, but must 
master $RECOVERY lock now
May 23 16:52:34 node2 kernel: (4491,2):dlm_get_lock_resource:921 
9D0941F9B5B843E0B8F8C9FD7D514C35:$RECOVERY: at least one node (3) 
torecover before lock mastery can begin
May 23 16:52:34 node2 kernel: (4491,2):dlm_get_lock_resource:955 
9D0941F9B5B843E0B8F8C9FD7D514C35: recovery map is not empty, but must 
master $RECOVERY lock now
May 23 16:52:37 node2 kernel: (23351,2):ocfs2_replay_journal:1167 
Recovering node 3 from slot 1 on device (8,97)
May 23 16:52:42 node2 kernel: kjournald starting.  Commit interval 5 seconds

**** node4

May 23 16:48:31 node4 kernel: ocfs2_dlm: Node 3 leaves domain 
84407FC4A92E451DADEF260A2FE0E366
May 23 16:48:31 node4 kernel: ocfs2_dlm: Nodes in domain 
("84407FC4A92E451DADEF260A2FE0E366"): 2 4
May 23 16:48:37 node4 kernel: ocfs2_dlm: Node 3 leaves domain 
1FB62EB34D1F495A9F11F396E707588C
May 23 16:48:37 node4 kernel: ocfs2_dlm: Nodes in domain 
("1FB62EB34D1F495A9F11F396E707588C"): 2 4
May 23 16:48:42 node4 kernel: ocfs2_dlm: Node 3 leaves domain 
793ACD36E8CA4067AB99F9F4F2229634
May 23 16:48:42 node4 kernel: ocfs2_dlm: Nodes in domain 
("793ACD36E8CA4067AB99F9F4F2229634"): 2 4
May 23 16:48:48 node4 kernel: ocfs2_dlm: Node 3 leaves domain 
ECECF9980CBD44EFA7E8A950EDE40573
May 23 16:48:48 node4 kernel: ocfs2_dlm: Nodes in domain 
("ECECF9980CBD44EFA7E8A950EDE40573"): 2 4
May 23 16:48:53 node4 kernel: ocfs2_dlm: Node 3 leaves domain 
D8AFCBD0CF59404991FAB19916CEE08B
May 23 16:48:53 node4 kernel: ocfs2_dlm: Nodes in domain 
("D8AFCBD0CF59404991FAB19916CEE08B"): 2 4
May 23 16:48:58 node4 kernel: ocfs2_dlm: Node 3 leaves domain 
E8B0A018151943A28674662818529F0F
May 23 16:48:58 node4 kernel: ocfs2_dlm: Nodes in domain 
("E8B0A018151943A28674662818529F0F"): 2 4
May 23 16:49:03 node4 kernel: ocfs2_dlm: Node 3 leaves domain 
3D227E224D0D4D9F97B84B0BB7DE7E22
May 23 16:49:03 node4 kernel: ocfs2_dlm: Nodes in domain 
("3D227E224D0D4D9F97B84B0BB7DE7E22"): 2 4
May 23 16:49:09 node4 kernel: ocfs2_dlm: Node 3 leaves domain 
C40090C8D14D48C9AC0D1024A228EC59
May 23 16:49:09 node4 kernel: ocfs2_dlm: Nodes in domain 
("C40090C8D14D48C9AC0D1024A228EC59"): 2 4
May 23 16:49:14 node4 kernel: ocfs2_dlm: Node 3 leaves domain 
9CB224941DC64A39872A5012FBD12354
May 23 16:49:14 node4 kernel: ocfs2_dlm: Nodes in domain 
("9CB224941DC64A39872A5012FBD12354"): 2 4
May 23 16:50:04 node4 kernel: o2net: connection to node node3.hst.host 
(num 3) at 192.168.0.3:7777 has been idle for 30.0 seconds, shutting it 
down.
May 23 16:50:04 node4 kernel: (19355,0):o2net_idle_timer:1418 here are 
some times that might help debug the situation: (tmr 1179949774.943813 
now 1179949804.944242 dr 1179949774.943805 adv 
1179949774.943815:1179949774.943817 func (d21ddb4d:513) 
1179949754.944088:1179949754.944097)
May 23 16:50:04 node4 kernel: o2net: no longer connected to node 
node3.hst.host (num 3) at 192.168.0.3:7777
May 23 16:50:04 node4 kernel: (18902,0):dlm_do_master_request:1418 
ERROR: link to 3 went down!
May 23 16:50:04 node4 kernel: (18902,0):dlm_get_lock_resource:995 ERROR: 
status = -112
May 23 16:52:31 node4 kernel: 
(22785,3):dlm_send_remote_convert_request:398 ERROR: status = -107
May 23 16:52:31 node4 kernel: (22785,3):dlm_wait_for_node_death:365 
BD2D6C1943FB4771B018EA2A7D056E8A: waiting 5000ms for notification of 
death of node 3
May 23 16:52:31 node4 kernel: (22786,3):dlm_get_lock_resource:921 
9D0941F9B5B843E0B8F8C9FD7D514C35:M0000000000000000000215b9fa93cd: at 
least one node (3) torecover before lock mastery can begin
May 23 16:52:31 node4 kernel: (22784,3):dlm_get_lock_resource:921 
36D7DEC36FC44C53A6107B6A9CE863A2:M0000000000000000000215b39ab40a: at 
least one node (3) torecover before lock mastery can begin
May 23 16:52:31 node4 kernel: (22783,3):dlm_get_lock_resource:921 
A85D18C01AE747AC905343D919B60525:M000000000000000000021535d8e891: at 
least one node (3) torecover before lock mastery can begin
May 23 16:52:31 node4 kernel: (4525,3):dlm_get_lock_resource:921 
A85D18C01AE747AC905343D919B60525:$RECOVERY: at least one node (3) 
torecover before lock mastery can begin
May 23 16:52:31 node4 kernel: (4525,3):dlm_get_lock_resource:955 
A85D18C01AE747AC905343D919B60525: recovery map is not empty, but must 
master $RECOVERY lock now
May 23 16:52:32 node4 kernel: (22786,3):dlm_get_lock_resource:976 
9D0941F9B5B843E0B8F8C9FD7D514C35:M0000000000000000000215b9fa93cd: at 
least one node (3) torecover before lock mastery can begin
May 23 16:52:32 node4 kernel: (22784,3):dlm_get_lock_resource:976 
36D7DEC36FC44C53A6107B6A9CE863A2:M0000000000000000000215b39ab40a: at 
least one node (3) torecover before lock mastery can begin
May 23 16:52:32 node4 kernel: (4483,0):ocfs2_dlm_eviction_cb:119 device 
(8,97): dlm has evicted node 3
May 23 16:52:33 node4 kernel: (4483,0):ocfs2_dlm_eviction_cb:119 device 
(8,81): dlm has evicted node 3
May 23 16:52:34 node4 kernel: (4483,0):ocfs2_dlm_eviction_cb:119 device 
(8,161): dlm has evicted node 3
May 23 16:52:35 node4 kernel: (18902,0):dlm_restart_lock_mastery:1301 
ERROR: node down! 3
May 23 16:52:35 node4 kernel: (18902,0):dlm_wait_for_lock_mastery:1118 
ERROR: status = -11
May 23 16:52:36 node4 kernel: (18902,0):dlm_get_lock_resource:976 
9D0941F9B5B843E0B8F8C9FD7D514C35:D0000000000000000030b2be3ea1a0c: at 
least one node (3) torecover before lock mastery can begin
May 23 16:52:37 node4 kernel: (22783,3):ocfs2_replay_journal:1167 
Recovering node 3 from slot 1 on device (8,49)
May 23 16:52:39 node4 kernel: (22784,0):ocfs2_replay_journal:1167 
Recovering node 3 from slot 1 on device (8,81)
May 23 16:52:40 node4 kernel: (22786,0):ocfs2_replay_journal:1167 
Recovering node 3 from slot 1 on device (8,161)
May 23 16:52:44 node4 kernel: kjournald starting.  Commit interval 5 seconds

-- 
Marcus Alves Grando <marcus.grando [] terra.com.br>
Suporte Engenharia 1
Terra Networks Brasil S/A
Tel: 55 (51) 3284-4238

Qual é a sua Terra?



More information about the Ocfs2-users mailing list