[Ocfs2-users] 2-node configuration ?

Wed Mar 5 10:17:52 PST 2008

Laurent,

  I am not a developer and I am not very familiar with the inner workings of OCFS2, so I am assuming some of the things below based on generic cluster design.

  There are two heartbeats, the network heartbeat and the disk heartbeat.

   If I am not mistaken the disk heartbeat is done on the block device that is mounted as a OCFS2 filesystem. So you can decide which node will fence by cutting its access to the disk device. 

   When using a SAN this is kind of simple, since there is a external disk device and one node eventually locks the device and forces the other node to be evicted.

    Since you are using DRDB, you need to make sure that the node that your cluster manager evicts cannot access the DRDB device any longer. As there are two paths to the DRDB device on each node (one local device and one remote device), I am not exactly sure how you will acomplish this or if DRDB already has this kind of control to prevent a split brain, but what you need to do is to block access to the shared disk device on the evicted node before the OCFS2 timeout.

Regards,
Luis

Laurent Neiger <Laurent.Neiger at grenoble.cnrs.fr> wrote:        Hi guys,

 I keep you in touch with my issue...

 Luis Freitas wrote: Laurent,

     What you need to be able to decide is what node still have network connectivity. If both have network connectivity you could fence any of them. If both lost connectivity (someone turned the switch off), then you are in trouble.

    You will need to plug the backend network in a switch and monitor the interface status, so when one machine is shutdown or you disconnect its network cable, you still get the up status on the other machine. If you dont want to use two switches, plug them into the same switch and use different vlans.

 Yes I achieved to do that. In my cluster manager, I'm able to know which node is still up before ocfs2 timers fence
 all nodes but the lower one, even if it's node0 which is off the network and node1 still connected.

    To deal with OCFS2 I think the easiest approach is increase its timeouts to let your cluster manager decide which node will survive before the OCFS2 heartbeat fences the node. I wouldnt be messing with its inner workings, YMMV...

 I think I managed to get time for my cluster manager to decide without having to increase ocfs2 timeouts.

 But my problem is not here.
 It's _HOW_ to cancel ocfs2 self-fencing on node 1 if I work out node0 have to be fenced and not node1.

 I tried this :
 node0 and node1 are OK, into the ocfs2 cluster, shared disk is mounted, all is fine.
 I guess both of them are writing their timers every two secs to their blocks in the "heartbeat system file",
 as mentionned in the FAQ.

 But what/where is "heartbeat system file", BTW ?

 When I unplug node0 network link, both of them say they lost their netcomm to the peer.
 Within the five first seconds, my cluster manager works out node0 is off the network,
 and node1 is OK. So the decision to have node0 fenced and cancel fencing for node1
 is taken (as node1 would have to be fenced according to ocfs2 decision of fencing the
 upper node number and leave the lower alive).

 So cluster manager runs "ocfs2_hb_ctl -K -d /dev/drbd0", which stops heartbeat on node1.

 But this doesn't prevent node1 to be self-fencing 28 seconds after netcomm lost, and
 node0 to stay alive with its deceased card. My entire cluster is down. No more service,
 nor data access, still available.

 Logical, afterwards, as heartbeat was stopped but timers still countdown, nothing reset them.

 Sunil Mushran <Sunil.Mushran at oracle.com> wrote:   Each of those pings will require a timeout - short timeouts. So short 
 that you
 may not even be able to distinguish between errors and overloaded run-queue,
 transmit queue, router, etc.  
 Once more I think I achieved that. My problem is to cancel self-fencing of node1,
 not to decide to do so.

 I'm sorry to annoy you, you might find it trivial but I probably missed something.

 You wrote "one does not have to have 3 nodes when one only wants 2 nodes".
 Great, this is fine for me as I don't (and can't) have SANs and drbd allows max 2 nodes
 for disk-sharing.

 I read too that behavior of fencing all nodes but the lower one is the wanted behavior
 of ocfs2.

 So I rephrase my question :

 How can I make a 2-node cluster works with high-availablity, i.e. still having access to
 the remaining node in the eventuality of _ANY_ node failure ? Cluster will be degraded,
 only one node remaining until we repair and power up the node which failed, but no
 services loss.
 Even if node0 fails, node1 still assumes tasks, rather than self-fencing.

 Once more thanks a lot for your help.

 Have a good day,

 best regards,

 Laurent.

 begin:vcard
fn:Laurent Neiger
n:Neiger;Laurent
org;quoted-printable:CNRS Grenoble;Centre R=C3=A9seau & Informatique Commun
adr:B.P. 166;;25, avenue des Martyrs;Grenoble;;38042;France
email;internet:Laurent.Neiger at grenoble.cnrs.fr
title;quoted-printable:Administrateur Syst=C3=A8mes & R=C3=A9seaux
tel;work:(0033) (0)4 76 88 79 91
tel;fax:(0033) (0)4 76 88 12 95
note:Certificats : http://igc.services.cnrs.fr/Doc/General/trust.html
x-mozilla-html:TRUE
url:http://cric.grenoble.cnrs.fr
version:2.1
end:vcard

_______________________________________________
Ocfs2-users mailing list
Ocfs2-users at oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

---------------------------------
Looking for last minute shopping deals?  Find them fast with Yahoo! Search.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20080305/9a27a22a/attachment-0001.html