[Ocfs2-users] 2-node configuration ?

Sunil Mushran Sunil.Mushran at oracle.com
Fri Feb 29 11:43:34 PST 2008


Laurent Neiger wrote:
> We could check at regular intervals (<10s of ocfs2 timeout, let's say 
> every 5 seconds
> for example) if the network comm between the 2 nodes is up. If not, on 
> maq2, if
> network comm is still OK (checking ifconfig status, or pinging a third 
> party such as
> a router), then maq2 is OK, and comm is lost between the 2 nodes 
> because of maq1.
> So on maq2, stop the ocfs2 heartbeat for avoiding self-fence, by using
> ocfs2_hb_ctl -K -d /dev/drbd0 (please tell me if I misunderstood this 
> command)
> and remote fence maq1 (if not a power supply failure, but a network 
> card one for example,
> we power off the bad node).
>
> So our cluster will still continue to work in degraded mode, until we 
> repair and power
> up maq1, and restart o2cb and ocfs2 on both nodes.
>
> So do you think doing that could be efficient for having a strong 
> cluster or do you have
> a better idea ?
>
Each of those pings will require a timeout - short timeouts. So short 
that you
may not even be able to distinguish between errors and overloaded run-queue,
transmit queue, router, etc. You will need an external hardware probes to
distinguish between slowdowns and errors.

Easy solution for your problem is to use net-bonding.

But then I guess you can rephrase the issue with some other precise hardware
error that allows the node to run as a single node but not in cluster. 
And what
if that node is the lower number.

In the end, you have to have shutdown windows. Windows in which you can 
recyle
the cluster. There is a reason people talk about 99.999% uptime and not 
100%. ;)





More information about the Ocfs2-users mailing list