[Ocfs2-users] Network 10 sec timeout setting?

Sunil Mushran Sunil.Mushran at oracle.com
Wed Feb 7 10:13:18 PST 2007


Means there was a network hiccup that caused Node 1 to fence itself.
The problem is that our default timeout is too low. We have already
addressed this in mainline and are looking to add that patch into 1.2.5.

I am unclear as to your last qs.

Randy Ramsdell wrote:
> Hi,
>
> Ok I'll try this again since there seems to be more people reading this
> list.
>
> I don't quite understand the log messages regarding fencing. Should the
> other nodes in the cluster that lost network connectivity state
> something about quorum/fencing etc...?
> Is it true that the  network timeout param. can be set in 1.2.4 and if
> not, can I change the setting myself before compile?
> What will we see in logs if a node cannot write to the clusterfs but
> heartbeat still works ?
>
>
> This node panic'd last night with this as the only log.
>
> "Node 1"
>  
> Feb  6 20:52:51 atl02010304 kernel: o2net: connection to node
> atl02010305 (num 1) at 192.168.3.105:7777 has been idle for 10 seconds,
> shutting it down.
> Feb  6 20:52:51 atl02010304 kernel: (15822,0):o2net_idle_timer:1309 here
> are some times that might help debug the situation: (tmr
> 1170813158.337779 now 1170813168.338726 dr 1170813163.339064 adv
> 1170813158.337780:1170813158.337780 func (ca3835ec:505)
> 1170813013.339584:1170813013.339601)
> Feb  6 20:52:51 atl02010304 kernel: o2net: connection to node
> atl02010310 (num 0) at 192.168.3.110:7777 has been idle for 10 seconds,
> shutting it down.
> Feb  6 20:52:51 atl02010304 kernel: (15486,0):o2net_idle_timer:1309 here
> are some times that might help debug the situation: (tmr
> 1170813161.826171 now 1170813171.827091 dr 1170813171.826723 adv
> 1170813161.826171:1170813161.826172 func (ca3835ec:506)
> 1170812821.832120:1170812821.832128)
> Feb  6 20:52:51 atl02010304 kernel: o2net: no longer connected to node
> atl02010305 (num 1) at 192.168.3.105:7777
> Feb  6 20:52:51 atl02010304 kernel: o2net: no longer connected to node
> atl02010310 (num 0) at 192.168.3.110:7777
>
> "Node 2"
>
>
> Jan 21 05:25:19 atl02010310 kernel: o2net: no longer connected to node
> atl02010304 (num 2) at 192.168.3.104:7777
> Jan 21 05:25:19 atl02010310 kernel: klogd 1.4.1, ---------- state change
> ----------
> Jan 21 05:25:21 atl02010310 kernel: (3716,1):dlm_get_lock_resource:847
> 32E007178FA24E87B45ECDDE6F7D5D52:$RECOVERY: at least one node (2)
> torecover before lock mastery can begin
> Jan 21 05:25:21 atl02010310 kernel: (3716,1):dlm_get_lock_resource:874
> 32E007178FA24E87B45ECDDE6F7D5D52: recovery map is not empty, but must
> master $RECOVERY lock now
>
> <snip>
>
> Jan 21 05:28:43 atl02010310 kernel: o2net: accepted connection from node
> atl02010304 (num 2) at 192.168.3.104:7777
> Jan 21 05:28:47 atl02010310 kernel: ocfs2_dlm: Node 2 joins domain
> 32E007178FA24E87B45ECDDE6F7D5D52
> Jan 21 05:28:47 atl02010310 kernel: ocfs2_dlm: Nodes in domain
> ("32E007178FA24E87B45ECDDE6F7D5D52"): 0 2
>
> "Node 3"
>
> Same as above
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>   



More information about the Ocfs2-users mailing list