[Ocfs2-users] Network 10 sec timeout setting?

Randy Ramsdell rramsdell at livedatagroup.com
Wed Feb 7 09:12:23 PST 2007


Hi,

Ok I'll try this again since there seems to be more people reading this
list.

I don't quite understand the log messages regarding fencing. Should the
other nodes in the cluster that lost network connectivity state
something about quorum/fencing etc...?
Is it true that the  network timeout param. can be set in 1.2.4 and if
not, can I change the setting myself before compile?
What will we see in logs if a node cannot write to the clusterfs but
heartbeat still works ?


This node panic'd last night with this as the only log.

"Node 1"
 
Feb  6 20:52:51 atl02010304 kernel: o2net: connection to node
atl02010305 (num 1) at 192.168.3.105:7777 has been idle for 10 seconds,
shutting it down.
Feb  6 20:52:51 atl02010304 kernel: (15822,0):o2net_idle_timer:1309 here
are some times that might help debug the situation: (tmr
1170813158.337779 now 1170813168.338726 dr 1170813163.339064 adv
1170813158.337780:1170813158.337780 func (ca3835ec:505)
1170813013.339584:1170813013.339601)
Feb  6 20:52:51 atl02010304 kernel: o2net: connection to node
atl02010310 (num 0) at 192.168.3.110:7777 has been idle for 10 seconds,
shutting it down.
Feb  6 20:52:51 atl02010304 kernel: (15486,0):o2net_idle_timer:1309 here
are some times that might help debug the situation: (tmr
1170813161.826171 now 1170813171.827091 dr 1170813171.826723 adv
1170813161.826171:1170813161.826172 func (ca3835ec:506)
1170812821.832120:1170812821.832128)
Feb  6 20:52:51 atl02010304 kernel: o2net: no longer connected to node
atl02010305 (num 1) at 192.168.3.105:7777
Feb  6 20:52:51 atl02010304 kernel: o2net: no longer connected to node
atl02010310 (num 0) at 192.168.3.110:7777

"Node 2"


Jan 21 05:25:19 atl02010310 kernel: o2net: no longer connected to node
atl02010304 (num 2) at 192.168.3.104:7777
Jan 21 05:25:19 atl02010310 kernel: klogd 1.4.1, ---------- state change
----------
Jan 21 05:25:21 atl02010310 kernel: (3716,1):dlm_get_lock_resource:847
32E007178FA24E87B45ECDDE6F7D5D52:$RECOVERY: at least one node (2)
torecover before lock mastery can begin
Jan 21 05:25:21 atl02010310 kernel: (3716,1):dlm_get_lock_resource:874
32E007178FA24E87B45ECDDE6F7D5D52: recovery map is not empty, but must
master $RECOVERY lock now

<snip>

Jan 21 05:28:43 atl02010310 kernel: o2net: accepted connection from node
atl02010304 (num 2) at 192.168.3.104:7777
Jan 21 05:28:47 atl02010310 kernel: ocfs2_dlm: Node 2 joins domain
32E007178FA24E87B45ECDDE6F7D5D52
Jan 21 05:28:47 atl02010310 kernel: ocfs2_dlm: Nodes in domain
("32E007178FA24E87B45ECDDE6F7D5D52"): 0 2

"Node 3"

Same as above




More information about the Ocfs2-users mailing list