[Ocfs2-users] two nodes hang

Thomas Lau thomaslau at esun.com
Tue Apr 19 02:59:16 PDT 2011


we have total 6 nodes which is running ocfs2, then all of sudden server1 
and server2 hang:

server1 log:
Apr 19 17:28:06 server1 kernel: o2net: connection to node server2 (num 
8) at 10.10.10.11:7777 has been idle for 60.0 seconds, shutting it down.
Apr 19 17:28:06 server1 kernel: (swapper,0,2):o2net_idle_timer:1503 here 
are some times that might help debug the situation: (tmr 
1303205226.698111 now 1303205286.697866 dr 1303205226.698364 adv 
1303205226.698371:1303205226.698372 func (a53de746:506) 
1303205226.698112:1303205226.698117)
Apr 19 17:28:06 server1 kernel: o2net: no longer connected to node 
server2 (num 8) at 10.10.10.11:7777
Apr 19 17:28:06 server1 kernel: (nfsd,5938,2):dlm_do_master_request:1334 
ERROR: link to 8 went down!
Apr 19 17:28:06 server1 kernel: (nfsd,5938,2):dlm_get_lock_resource:917 
ERROR: status = -112
Apr 19 17:28:06 server1 kernel: 
(httpd,983,2):dlm_send_remote_convert_request:395 ERROR: status = -112
Apr 19 17:28:06 server1 kernel: 
(httpd,983,2):dlm_wait_for_node_death:370 
8A93E08BB47B4ABFBC4FD0AD1744EFC2: waiting 5000ms for notification of 
death of node 8
Apr 19 17:28:06 server1 kernel: 
(httpd,1061,2):dlm_do_master_request:1334 ERROR: link to 8 went down!
Apr 19 17:28:06 server1 kernel: (httpd,1061,2):dlm_get_lock_resource:917 
ERROR: status = -112
Apr 19 17:28:06 server1 kernel: 
(httpd,1137,2):dlm_do_master_request:1334 ERROR: link to 8 went down!
Apr 19 17:28:06 server1 kernel: (httpd,1137,2):dlm_get_lock_resource:917 
ERROR: status = -112
Apr 19 17:28:11 server1 kernel: 
(httpd,983,2):dlm_send_remote_convert_request:395 ERROR: status = -107
Apr 19 17:28:11 server1 kernel: 
(httpd,983,2):dlm_wait_for_node_death:370 
8A93E08BB47B4ABFBC4FD0AD1744EFC2: waiting 5000ms for notification of 
death of node 8
Apr 19 17:28:16 server1 kernel: 
(httpd,983,2):dlm_send_remote_convert_request:395 ERROR: status = -107
Apr 19 17:28:16 server1 kernel: 
(httpd,983,2):dlm_wait_for_node_death:370 
8A93E08BB47B4ABFBC4FD0AD1744EFC2: waiting 5000ms for notification of 
death of node 8



server2:
Apr 19 17:28:05 server2 kernel: o2net: no longer connected to node 
server1 (num 7) at 10.10.10.10:7777
Apr 19 17:28:05 server2 kernel: 
(dlm_thread,13293,3):dlm_drop_lockres_ref:2211 ERROR: status = -112
Apr 19 17:28:05 server2 kernel: 
(dlm_thread,13293,3):dlm_purge_lockres:206 ERROR: status = -112
Apr 19 17:28:05 server2 kernel: 
(httpd,11084,0):dlm_do_master_request:1334 ERROR: link to 7 went down!
Apr 19 17:28:05 server2 kernel: 
(httpd,11084,0):dlm_get_lock_resource:917 ERROR: status = -112
Apr 19 17:28:05 server2 kernel: 
(httpd,8376,2):dlm_do_master_request:1334 ERROR: link to 7 went down!
Apr 19 17:28:05 server2 kernel: (httpd,8376,2):dlm_get_lock_resource:917 
ERROR: status = -112
Apr 19 17:28:05 server2 kernel: 
(crond,7757,2):dlm_send_remote_unlock_request:359 ERROR: status = -112
Apr 19 17:28:05 server2 kernel: 
(dlm_thread,13293,3):dlm_drop_lockres_ref:2211 ERROR: status = -107
Apr 19 17:28:05 server2 kernel: 
(dlm_thread,13293,3):dlm_purge_lockres:206 ERROR: status = -107
Apr 19 17:28:05 server2 kernel: 
(dlm_thread,13293,3):dlm_drop_lockres_ref:2211 ERROR: status = -107
Apr 19 17:28:05 server2 kernel: 
(dlm_thread,13293,3):dlm_purge_lockres:206 ERROR: status = -107
Apr 19 17:28:05 server2 kernel: 
(crond,7757,2):dlm_send_remote_unlock_request:359 ERROR: status = -107
Apr 19 17:28:05 server2 kernel: 
(dlm_thread,13293,3):dlm_drop_lockres_ref:2211 ERROR: status = -107
Apr 19 17:28:06 server2 kernel: 
m_send_remolm_send_remote_unlock_request:359 ERROR: status = -107
Apr 19 17:28:06 server2 kernel: 
(httpd,11282,0):dlm_send_remote_unlock_request:359 ERROR: status = -107
Apr 19 17:28:18 server2 last message repeated 7 times
Apr 19 17:28:18 server2 kernel: 
(sshd,7691,0):dlm_send_remote_unlock_request:359 ERROR: status = -107
Apr 19 17:28:19 server2 last message repeated 45 times
Apr 19 17:28:19 server2 kernel: 
(sshd,7691,0):dlm_send_remote_unlock_request:359 ERROR: status = -10OR: 
status = -107



Anyone have idea why?

-- 
Thomas Lau
Infrastructure Delivery Manager
eSun Holdings Limited
Mobile: 852-93239670
Office phone: 29058104


"always I strive to push the boundaries of what we know, and what seems possible to us at this moment in time. The walls between art and engineering exist only in our minds, and few have the imagination to see beyond them."

– Theo Jansen




More information about the Ocfs2-users mailing list