[Ocfs2-users] dlm timeouts and following errors -112

Sebastian Reitenbach reitenbach_pub at rapideye.de
Mon Feb 26 12:18:25 PST 2007


Hi list,

I am experimenting with ocfs2 (rpm package: 1.2.2-0.2), using linux-ha 2.0.8 
(all running on a SLES 10 x86-64, rpm packages from linux-ha.org) for the 
heartbeat. The three nodes are connected on a gigabit switch. From time to 
time  I have problems to unmount a drive, and I have to reboot the whole 
system to fix the problem. When these lockups occur, I see these messages 
in /var/log/messages:


Feb 26 21:03:47 ppsbackup101 heartbeat: [5394]: ERROR: Irretrievably lost
packet: node ppsdb102 seq 6
Feb 26 21:03:47 ppsbackup101 heartbeat: [5394]: ERROR: Irretrievably lost
packet: node ppsdb102 seq 6
Feb 26 21:04:32 ppsbackup101 kernel: o2net: connection to node ppsnfs102 (num 
3)
at 192.168.102.32:7777 has been idle for 300.0 seconds, shutting it down.
Feb 26 21:04:32 ppsbackup101 kernel: (5394,1):o2net_idle_timer:1426 here are
some times that might help debug the situation: (tmr 1172519972.626184 now
1172520272.653263 dr 1172519972.626167 adv 1172519972.626208:1172519972.626210
func (666c6172:510) 1172519972.626186:1172519972.626195)
Feb 26 21:04:32 ppsbackup101 kernel: o2net: no longer connected to node
ppsnfs102 (num 3) at 192.168.102.32:7777
Feb 26 21:04:32 ppsbackup101 kernel: (8915,0):dlm_drop_lockres_ref:2283 ERROR:
status = -112
Feb 26 21:04:32 ppsbackup101 kernel: (11534,2):dlm_request_join:899 ERROR:
status = -112
Feb 26 21:04:32 ppsbackup101 kernel: (11534,2):dlm_try_to_join_domain:1048
ERROR: status = -112
Feb 26 21:04:32 ppsbackup101 kernel: (8915,0):dlm_purge_lockres:189 ERROR:
status = -112
Feb 26 21:04:32 ppsbackup101 kernel: (11534,2):dlm_join_domain:1321 ERROR:
status = -112
Feb 26 21:04:32 ppsbackup101 kernel: (11534,2):dlm_register_domain:1514 ERROR:
status = -112
Feb 26 21:04:32 ppsbackup101 kernel: (11534,2):ocfs2_dlm_init:2007 ERROR: 
status
= -112
Feb 26 21:04:32 ppsbackup101 kernel: (11375,0):dlm_leave_domain:565 Error -112
sending domain exit message to node 3
Feb 26 21:04:32 ppsbackup101 kernel: (11534,2):ocfs2_mount_volume:1093 ERROR:
status = -112
Feb 26 21:04:32 ppsbackup101 kernel: ocfs2: Unmounting device (8,145) on (node
4)
Feb 26 21:04:32 ppsbackup101 kernel: (11449,3):dlm_request_join:899 ERROR:
status = -112
Feb 26 21:04:32 ppsbackup101 kernel: (11449,3):dlm_try_to_join_domain:1048
ERROR: status = -112
Feb 26 21:04:32 ppsbackup101 kernel: (11449,3):dlm_join_domain:1321 ERROR:
status = -112
Feb 26 21:04:32 ppsbackup101 kernel: (11449,3):dlm_register_domain:1514 ERROR:
status = -112
Feb 26 21:04:32 ppsbackup101 kernel: (11449,3):ocfs2_dlm_init:2007 ERROR: 
status
= -112
Feb 26 21:04:32 ppsbackup101 kernel: (11449,3):ocfs2_mount_volume:1093 ERROR:
status = -112
Feb 26 21:04:32 ppsbackup101 kernel: ocfs2: Unmounting device (8,97) on (node 
4)
Feb 26 21:04:32 ppsbackup101 kernel: ocfs2: Unmounting device (8,129) on (node
4)
Feb 26 21:04:33 ppsbackup101 kernel: ocfs2: Unmounting device (8,113) on (node
4)


I think it is because of the timeout at the beginning of the logs, but don't 
know whether I am right, and what I can do to make it not happen anymore. Is 
there anything I can do to overcome these problems?

kind regards
Sebastian




More information about the Ocfs2-users mailing list