[Ocfs2-users] ocfs2 fencing and half qurom issue

Joseph Blackburn jblackbu01 at gmail.com
Tue Nov 7 12:41:10 PST 2006


One of the nodes in a two node system keeps crashing every couple of
weeks. I'm getting this error message in /var/log/messages.

>Nov  7 14:21:40 cib-sim-wec-04 kernel: (0,0):o2net_idle_timer:1306
connection to
>  node cib-sim-wec-03 (num 0) at 162.111.10.230:7777 has been idle for 10 seconds
>, shutting it down.
>Nov  7 14:21:40 cib-sim-wec-04 kernel: (0,0):o2net_idle_timer:1317
here are some
>  times that might help debug the situation: (tmr 1162927290.922909 now 116292730
>0.921182 dr 1162927290.922894 adv 1162927290.922916:1162927290.922917
func (06c6
>e508:504) 1162914724.648412:1162914724.648415)
>Nov  7 14:21:40 cib-sim-wec-04 kernel:
(9397,0):o2net_set_nn_state:407 no longer
>  connected to node cib-sim-wec-03 (num 0) at 162.111.10.230:7777
>Nov  7 14:23:16 cib-sim-wec-04 kernel: (6,0):o2quo_make_decision:144
ERROR: fenc
>ing this node because it is connected to a half-quorum of 1 out of 2
nodes which
> doesn't include the lowest active node 0
>Nov  7 14:23:16 cib-sim-wec-04 kernel:
(6,0):o2hb_stop_all_regions:1728 ERROR: s
>topping heartbeat on all active regions.

I've already changed my threshold settings to the following:

>cib-sim-wec-04:/proc/fs/ocfs2_nodemanager # more hb_dead_threshold
>46

Here's a little background. Two node setup DL580s, with x-over cables
for heartbeats.
SLES9-SP3. OCFS2 version 1.1.7.

Any help would be greatly appreciated.
Thanks



More information about the Ocfs2-users mailing list