[Ocfs2-users] dlm timeouts and following errors -112

Sunil Mushran Sunil.Mushran at oracle.com
Mon Feb 26 12:27:26 PST 2007


Yes, the messages are related. -112 is EHOSTDOWN.

Sebastian Reitenbach wrote:
> Hi list,
>
> I am experimenting with ocfs2 (rpm package: 1.2.2-0.2), using linux-ha 2.0.8 
> (all running on a SLES 10 x86-64, rpm packages from linux-ha.org) for the 
> heartbeat. The three nodes are connected on a gigabit switch. From time to 
> time  I have problems to unmount a drive, and I have to reboot the whole 
> system to fix the problem. When these lockups occur, I see these messages 
> in /var/log/messages:
>
>
> Feb 26 21:03:47 ppsbackup101 heartbeat: [5394]: ERROR: Irretrievably lost
> packet: node ppsdb102 seq 6
> Feb 26 21:03:47 ppsbackup101 heartbeat: [5394]: ERROR: Irretrievably lost
> packet: node ppsdb102 seq 6
> Feb 26 21:04:32 ppsbackup101 kernel: o2net: connection to node ppsnfs102 (num 
> 3)
> at 192.168.102.32:7777 has been idle for 300.0 seconds, shutting it down.
> Feb 26 21:04:32 ppsbackup101 kernel: (5394,1):o2net_idle_timer:1426 here are
> some times that might help debug the situation: (tmr 1172519972.626184 now
> 1172520272.653263 dr 1172519972.626167 adv 1172519972.626208:1172519972.626210
> func (666c6172:510) 1172519972.626186:1172519972.626195)
> Feb 26 21:04:32 ppsbackup101 kernel: o2net: no longer connected to node
> ppsnfs102 (num 3) at 192.168.102.32:7777
> Feb 26 21:04:32 ppsbackup101 kernel: (8915,0):dlm_drop_lockres_ref:2283 ERROR:
> status = -112
> Feb 26 21:04:32 ppsbackup101 kernel: (11534,2):dlm_request_join:899 ERROR:
> status = -112
> Feb 26 21:04:32 ppsbackup101 kernel: (11534,2):dlm_try_to_join_domain:1048
> ERROR: status = -112
> Feb 26 21:04:32 ppsbackup101 kernel: (8915,0):dlm_purge_lockres:189 ERROR:
> status = -112
> Feb 26 21:04:32 ppsbackup101 kernel: (11534,2):dlm_join_domain:1321 ERROR:
> status = -112
> Feb 26 21:04:32 ppsbackup101 kernel: (11534,2):dlm_register_domain:1514 ERROR:
> status = -112
> Feb 26 21:04:32 ppsbackup101 kernel: (11534,2):ocfs2_dlm_init:2007 ERROR: 
> status
> = -112
> Feb 26 21:04:32 ppsbackup101 kernel: (11375,0):dlm_leave_domain:565 Error -112
> sending domain exit message to node 3
> Feb 26 21:04:32 ppsbackup101 kernel: (11534,2):ocfs2_mount_volume:1093 ERROR:
> status = -112
> Feb 26 21:04:32 ppsbackup101 kernel: ocfs2: Unmounting device (8,145) on (node
> 4)
> Feb 26 21:04:32 ppsbackup101 kernel: (11449,3):dlm_request_join:899 ERROR:
> status = -112
> Feb 26 21:04:32 ppsbackup101 kernel: (11449,3):dlm_try_to_join_domain:1048
> ERROR: status = -112
> Feb 26 21:04:32 ppsbackup101 kernel: (11449,3):dlm_join_domain:1321 ERROR:
> status = -112
> Feb 26 21:04:32 ppsbackup101 kernel: (11449,3):dlm_register_domain:1514 ERROR:
> status = -112
> Feb 26 21:04:32 ppsbackup101 kernel: (11449,3):ocfs2_dlm_init:2007 ERROR: 
> status
> = -112
> Feb 26 21:04:32 ppsbackup101 kernel: (11449,3):ocfs2_mount_volume:1093 ERROR:
> status = -112
> Feb 26 21:04:32 ppsbackup101 kernel: ocfs2: Unmounting device (8,97) on (node 
> 4)
> Feb 26 21:04:32 ppsbackup101 kernel: ocfs2: Unmounting device (8,129) on (node
> 4)
> Feb 26 21:04:33 ppsbackup101 kernel: ocfs2: Unmounting device (8,113) on (node
> 4)
>
>
> I think it is because of the timeout at the beginning of the logs, but don't 
> know whether I am right, and what I can do to make it not happen anymore. Is 
> there anything I can do to overcome these problems?
>
> kind regards
> Sebastian
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>   



More information about the Ocfs2-users mailing list