[Ocfs2-users] "node down!" are related with SVN rev 3004?
Marcus Alves Grando
marcus.grando at terra.com.br
Wed May 23 14:42:42 PDT 2007
Hi list,
Today i have a problem with ocfs2, one server stop access to ocfs2 disks
and the only message in /var/log/messages are:
May 23 16:24:26 node3 kernel: (6956,3):dlm_restart_lock_mastery:1301
ERROR: node down! 1
May 23 16:24:26 node3 kernel: (6956,3):dlm_wait_for_lock_mastery:1118
ERROR: status = -11
I don't know what's happened. Maybe that's related with 3004 fix?
Someone already see that?
Another strange fact are all nodes mount 13 SAN disks and "leaves"
messages occurrs only nine times.
Another fact are node1 is down to maintanance since 08:30.
Others servers have this messages:
**** node2
May 23 16:48:31 node2 kernel: ocfs2_dlm: Node 3 leaves domain
84407FC4A92E451DADEF260A2FE0E366
May 23 16:48:31 node2 kernel: ocfs2_dlm: Nodes in domain
("84407FC4A92E451DADEF260A2FE0E366"): 2 4
May 23 16:48:37 node2 kernel: ocfs2_dlm: Node 3 leaves domain
1FB62EB34D1F495A9F11F396E707588C
May 23 16:48:37 node2 kernel: ocfs2_dlm: Nodes in domain
("1FB62EB34D1F495A9F11F396E707588C"): 2 4
May 23 16:48:42 node2 kernel: ocfs2_dlm: Node 3 leaves domain
793ACD36E8CA4067AB99F9F4F2229634
May 23 16:48:42 node2 kernel: ocfs2_dlm: Nodes in domain
("793ACD36E8CA4067AB99F9F4F2229634"): 2 4
May 23 16:48:48 node2 kernel: ocfs2_dlm: Node 3 leaves domain
ECECF9980CBD44EFA7E8A950EDE40573
May 23 16:48:48 node2 kernel: ocfs2_dlm: Nodes in domain
("ECECF9980CBD44EFA7E8A950EDE40573"): 2 4
May 23 16:48:53 node2 kernel: ocfs2_dlm: Node 3 leaves domain
D8AFCBD0CF59404991FAB19916CEE08B
May 23 16:48:53 node2 kernel: ocfs2_dlm: Nodes in domain
("D8AFCBD0CF59404991FAB19916CEE08B"): 2 4
May 23 16:48:58 node2 kernel: ocfs2_dlm: Node 3 leaves domain
E8B0A018151943A28674662818529F0F
May 23 16:48:58 node2 kernel: ocfs2_dlm: Nodes in domain
("E8B0A018151943A28674662818529F0F"): 2 4
May 23 16:49:03 node2 kernel: ocfs2_dlm: Node 3 leaves domain
3D227E224D0D4D9F97B84B0BB7DE7E22
May 23 16:49:03 node2 kernel: ocfs2_dlm: Nodes in domain
("3D227E224D0D4D9F97B84B0BB7DE7E22"): 2 4
May 23 16:49:09 node2 kernel: ocfs2_dlm: Node 3 leaves domain
C40090C8D14D48C9AC0D1024A228EC59
May 23 16:49:09 node2 kernel: ocfs2_dlm: Nodes in domain
("C40090C8D14D48C9AC0D1024A228EC59"): 2 4
May 23 16:49:14 node2 kernel: ocfs2_dlm: Node 3 leaves domain
9CB224941DC64A39872A5012FBD12354
May 23 16:49:14 node2 kernel: ocfs2_dlm: Nodes in domain
("9CB224941DC64A39872A5012FBD12354"): 2 4
May 23 16:50:04 node2 kernel: o2net: connection to node node3.hst.host
(num 3) at 192.168.0.3:7777 has been idle for 30.0 seconds, shutting it
down.
May 23 16:50:04 node2 kernel: (0,3):o2net_idle_timer:1418 here are some
times that might help debug the situation: (tmr 1179949774.944117 now
1179949804.944585 dr 1179949774.944109 adv
1179949774.944119:1179949774.944121 func (d21ddb4d:513)
1179949754.944260:1179949754.944271)
May 23 16:50:04 node2 kernel: o2net: no longer connected to node
node3.hst.host (num 3) at 192.168.0.3:7777
May 23 16:52:31 node2 kernel:
(23351,2):dlm_send_remote_convert_request:398 ERROR: status = -107
May 23 16:52:31 node2 kernel: (23351,2):dlm_wait_for_node_death:365
BD2D6C1943FB4771B018EA2A7D056E8A: waiting 5000ms for notification of
death of node 3
May 23 16:52:32 node2 kernel: (4379,3):ocfs2_dlm_eviction_cb:119 device
(8,49): dlm has evicted node 3
May 23 16:52:32 node2 kernel: (4451,1):dlm_get_lock_resource:921
BD2D6C1943FB4771B018EA2A7D056E8A:$RECOVERY: at least one node (3)
torecover before lock mastery can begin
May 23 16:52:32 node2 kernel: (4451,1):dlm_get_lock_resource:955
BD2D6C1943FB4771B018EA2A7D056E8A: recovery map is not empty, but must
master $RECOVERY lock now
May 23 16:52:33 node2 kernel: (4441,2):dlm_get_lock_resource:921
36D7DEC36FC44C53A6107B6A9CE863A2:$RECOVERY: at least one node (3)
torecover before lock mastery can begin
May 23 16:52:33 node2 kernel: (4441,2):dlm_get_lock_resource:955
36D7DEC36FC44C53A6107B6A9CE863A2: recovery map is not empty, but must
master $RECOVERY lock now
May 23 16:52:34 node2 kernel: (4491,2):dlm_get_lock_resource:921
9D0941F9B5B843E0B8F8C9FD7D514C35:$RECOVERY: at least one node (3)
torecover before lock mastery can begin
May 23 16:52:34 node2 kernel: (4491,2):dlm_get_lock_resource:955
9D0941F9B5B843E0B8F8C9FD7D514C35: recovery map is not empty, but must
master $RECOVERY lock now
May 23 16:52:37 node2 kernel: (23351,2):ocfs2_replay_journal:1167
Recovering node 3 from slot 1 on device (8,97)
May 23 16:52:42 node2 kernel: kjournald starting. Commit interval 5 seconds
**** node4
May 23 16:48:31 node4 kernel: ocfs2_dlm: Node 3 leaves domain
84407FC4A92E451DADEF260A2FE0E366
May 23 16:48:31 node4 kernel: ocfs2_dlm: Nodes in domain
("84407FC4A92E451DADEF260A2FE0E366"): 2 4
May 23 16:48:37 node4 kernel: ocfs2_dlm: Node 3 leaves domain
1FB62EB34D1F495A9F11F396E707588C
May 23 16:48:37 node4 kernel: ocfs2_dlm: Nodes in domain
("1FB62EB34D1F495A9F11F396E707588C"): 2 4
May 23 16:48:42 node4 kernel: ocfs2_dlm: Node 3 leaves domain
793ACD36E8CA4067AB99F9F4F2229634
May 23 16:48:42 node4 kernel: ocfs2_dlm: Nodes in domain
("793ACD36E8CA4067AB99F9F4F2229634"): 2 4
May 23 16:48:48 node4 kernel: ocfs2_dlm: Node 3 leaves domain
ECECF9980CBD44EFA7E8A950EDE40573
May 23 16:48:48 node4 kernel: ocfs2_dlm: Nodes in domain
("ECECF9980CBD44EFA7E8A950EDE40573"): 2 4
May 23 16:48:53 node4 kernel: ocfs2_dlm: Node 3 leaves domain
D8AFCBD0CF59404991FAB19916CEE08B
May 23 16:48:53 node4 kernel: ocfs2_dlm: Nodes in domain
("D8AFCBD0CF59404991FAB19916CEE08B"): 2 4
May 23 16:48:58 node4 kernel: ocfs2_dlm: Node 3 leaves domain
E8B0A018151943A28674662818529F0F
May 23 16:48:58 node4 kernel: ocfs2_dlm: Nodes in domain
("E8B0A018151943A28674662818529F0F"): 2 4
May 23 16:49:03 node4 kernel: ocfs2_dlm: Node 3 leaves domain
3D227E224D0D4D9F97B84B0BB7DE7E22
May 23 16:49:03 node4 kernel: ocfs2_dlm: Nodes in domain
("3D227E224D0D4D9F97B84B0BB7DE7E22"): 2 4
May 23 16:49:09 node4 kernel: ocfs2_dlm: Node 3 leaves domain
C40090C8D14D48C9AC0D1024A228EC59
May 23 16:49:09 node4 kernel: ocfs2_dlm: Nodes in domain
("C40090C8D14D48C9AC0D1024A228EC59"): 2 4
May 23 16:49:14 node4 kernel: ocfs2_dlm: Node 3 leaves domain
9CB224941DC64A39872A5012FBD12354
May 23 16:49:14 node4 kernel: ocfs2_dlm: Nodes in domain
("9CB224941DC64A39872A5012FBD12354"): 2 4
May 23 16:50:04 node4 kernel: o2net: connection to node node3.hst.host
(num 3) at 192.168.0.3:7777 has been idle for 30.0 seconds, shutting it
down.
May 23 16:50:04 node4 kernel: (19355,0):o2net_idle_timer:1418 here are
some times that might help debug the situation: (tmr 1179949774.943813
now 1179949804.944242 dr 1179949774.943805 adv
1179949774.943815:1179949774.943817 func (d21ddb4d:513)
1179949754.944088:1179949754.944097)
May 23 16:50:04 node4 kernel: o2net: no longer connected to node
node3.hst.host (num 3) at 192.168.0.3:7777
May 23 16:50:04 node4 kernel: (18902,0):dlm_do_master_request:1418
ERROR: link to 3 went down!
May 23 16:50:04 node4 kernel: (18902,0):dlm_get_lock_resource:995 ERROR:
status = -112
May 23 16:52:31 node4 kernel:
(22785,3):dlm_send_remote_convert_request:398 ERROR: status = -107
May 23 16:52:31 node4 kernel: (22785,3):dlm_wait_for_node_death:365
BD2D6C1943FB4771B018EA2A7D056E8A: waiting 5000ms for notification of
death of node 3
May 23 16:52:31 node4 kernel: (22786,3):dlm_get_lock_resource:921
9D0941F9B5B843E0B8F8C9FD7D514C35:M0000000000000000000215b9fa93cd: at
least one node (3) torecover before lock mastery can begin
May 23 16:52:31 node4 kernel: (22784,3):dlm_get_lock_resource:921
36D7DEC36FC44C53A6107B6A9CE863A2:M0000000000000000000215b39ab40a: at
least one node (3) torecover before lock mastery can begin
May 23 16:52:31 node4 kernel: (22783,3):dlm_get_lock_resource:921
A85D18C01AE747AC905343D919B60525:M000000000000000000021535d8e891: at
least one node (3) torecover before lock mastery can begin
May 23 16:52:31 node4 kernel: (4525,3):dlm_get_lock_resource:921
A85D18C01AE747AC905343D919B60525:$RECOVERY: at least one node (3)
torecover before lock mastery can begin
May 23 16:52:31 node4 kernel: (4525,3):dlm_get_lock_resource:955
A85D18C01AE747AC905343D919B60525: recovery map is not empty, but must
master $RECOVERY lock now
May 23 16:52:32 node4 kernel: (22786,3):dlm_get_lock_resource:976
9D0941F9B5B843E0B8F8C9FD7D514C35:M0000000000000000000215b9fa93cd: at
least one node (3) torecover before lock mastery can begin
May 23 16:52:32 node4 kernel: (22784,3):dlm_get_lock_resource:976
36D7DEC36FC44C53A6107B6A9CE863A2:M0000000000000000000215b39ab40a: at
least one node (3) torecover before lock mastery can begin
May 23 16:52:32 node4 kernel: (4483,0):ocfs2_dlm_eviction_cb:119 device
(8,97): dlm has evicted node 3
May 23 16:52:33 node4 kernel: (4483,0):ocfs2_dlm_eviction_cb:119 device
(8,81): dlm has evicted node 3
May 23 16:52:34 node4 kernel: (4483,0):ocfs2_dlm_eviction_cb:119 device
(8,161): dlm has evicted node 3
May 23 16:52:35 node4 kernel: (18902,0):dlm_restart_lock_mastery:1301
ERROR: node down! 3
May 23 16:52:35 node4 kernel: (18902,0):dlm_wait_for_lock_mastery:1118
ERROR: status = -11
May 23 16:52:36 node4 kernel: (18902,0):dlm_get_lock_resource:976
9D0941F9B5B843E0B8F8C9FD7D514C35:D0000000000000000030b2be3ea1a0c: at
least one node (3) torecover before lock mastery can begin
May 23 16:52:37 node4 kernel: (22783,3):ocfs2_replay_journal:1167
Recovering node 3 from slot 1 on device (8,49)
May 23 16:52:39 node4 kernel: (22784,0):ocfs2_replay_journal:1167
Recovering node 3 from slot 1 on device (8,81)
May 23 16:52:40 node4 kernel: (22786,0):ocfs2_replay_journal:1167
Recovering node 3 from slot 1 on device (8,161)
May 23 16:52:44 node4 kernel: kjournald starting. Commit interval 5 seconds
--
Marcus Alves Grando <marcus.grando [] terra.com.br>
Suporte Engenharia 1
Terra Networks Brasil S/A
Tel: 55 (51) 3284-4238
Qual é a sua Terra?
More information about the Ocfs2-users
mailing list