[Ocfs2-users] Network 10 sec timeout setting?

Alexei_Roudnev Alexei_Roudnev at exigengroup.com
Wed Feb 7 10:14:57 PST 2007


This timeout is exactly the reason, why I can't use OCFSv2 for anything
serious except low used file systems.
We had 2 such reboots (on heavy loaded cluster which we used for the stress
tests) during last week, and, having Ethernet convergence time 40 seconds
and no way to guarantee 10 sec timeout on real network (and no way to
prevent this reboot, even if file system do nothing. plus no support for
multi interface hostsin OCFSv2), I can not use it for anything except games,
experiments and/or not heavi loaded servers. )( set it up instead of NetApp
NFS for the dpocument storage on the test system - it survived merely for
month).

(IT all works for week, even monthes - until server is really busy. Then it
panic. If switch restarts - it panic. If somene reconenct cable - it panic.
If
disk system restarts - it panic. No any other cluster panic - OCVSf2 only!).


----- Original Message ----- 
From: "Randy Ramsdell" <rramsdell at livedatagroup.com>
To: "OCFS2 Users List" <ocfs2-users at oss.oracle.com>
Sent: Wednesday, February 07, 2007 9:12 AM
Subject: Re: [Ocfs2-users] Network 10 sec timeout setting?


> Hi,
>
> Ok I'll try this again since there seems to be more people reading this
> list.
>
> I don't quite understand the log messages regarding fencing. Should the
> other nodes in the cluster that lost network connectivity state
> something about quorum/fencing etc...?
> Is it true that the  network timeout param. can be set in 1.2.4 and if
> not, can I change the setting myself before compile?
> What will we see in logs if a node cannot write to the clusterfs but
> heartbeat still works ?
>
>
> This node panic'd last night with this as the only log.
>
> "Node 1"
>
> Feb  6 20:52:51 atl02010304 kernel: o2net: connection to node
> atl02010305 (num 1) at 192.168.3.105:7777 has been idle for 10 seconds,
> shutting it down.
> Feb  6 20:52:51 atl02010304 kernel: (15822,0):o2net_idle_timer:1309 here
> are some times that might help debug the situation: (tmr
> 1170813158.337779 now 1170813168.338726 dr 1170813163.339064 adv
> 1170813158.337780:1170813158.337780 func (ca3835ec:505)
> 1170813013.339584:1170813013.339601)
> Feb  6 20:52:51 atl02010304 kernel: o2net: connection to node
> atl02010310 (num 0) at 192.168.3.110:7777 has been idle for 10 seconds,
> shutting it down.
> Feb  6 20:52:51 atl02010304 kernel: (15486,0):o2net_idle_timer:1309 here
> are some times that might help debug the situation: (tmr
> 1170813161.826171 now 1170813171.827091 dr 1170813171.826723 adv
> 1170813161.826171:1170813161.826172 func (ca3835ec:506)
> 1170812821.832120:1170812821.832128)
> Feb  6 20:52:51 atl02010304 kernel: o2net: no longer connected to node
> atl02010305 (num 1) at 192.168.3.105:7777
> Feb  6 20:52:51 atl02010304 kernel: o2net: no longer connected to node
> atl02010310 (num 0) at 192.168.3.110:7777
>
> "Node 2"
>
>
> Jan 21 05:25:19 atl02010310 kernel: o2net: no longer connected to node
> atl02010304 (num 2) at 192.168.3.104:7777
> Jan 21 05:25:19 atl02010310 kernel: klogd 1.4.1, ---------- state change
> ----------
> Jan 21 05:25:21 atl02010310 kernel: (3716,1):dlm_get_lock_resource:847
> 32E007178FA24E87B45ECDDE6F7D5D52:$RECOVERY: at least one node (2)
> torecover before lock mastery can begin
> Jan 21 05:25:21 atl02010310 kernel: (3716,1):dlm_get_lock_resource:874
> 32E007178FA24E87B45ECDDE6F7D5D52: recovery map is not empty, but must
> master $RECOVERY lock now
>
> <snip>
>
> Jan 21 05:28:43 atl02010310 kernel: o2net: accepted connection from node
> atl02010304 (num 2) at 192.168.3.104:7777
> Jan 21 05:28:47 atl02010310 kernel: ocfs2_dlm: Node 2 joins domain
> 32E007178FA24E87B45ECDDE6F7D5D52
> Jan 21 05:28:47 atl02010310 kernel: ocfs2_dlm: Nodes in domain
> ("32E007178FA24E87B45ECDDE6F7D5D52"): 0 2
>
> "Node 3"
>
> Same as above
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>




More information about the Ocfs2-users mailing list