[Ocfs2-users] I have questions regarding Fencing

Joel Becker Joel.Becker at oracle.com
Mon Mar 1 18:40:35 PST 2010


On Mon, Mar 01, 2010 at 03:34:59PM -0500, Enrique Sanchez wrote:
> Whenever there is a split brain scenario, the node with the lowest
> number survive, I am sold on that and have no argument against it, but
> when Node0 crashes, Node1 also takes a nose dive, may I know why?

	Two nodes is a special and difficult case.  If node0 is still
heartbeating, node1 thinks it is alive; by the lowest number rule, node1
resets.  If node0 is not heartbeating (a full crash), node1 will stay
alive.  As long as node0 is heartbeating, there is no way for node1 to
know that node0 is having trouble.
	If this case presents a significant problem, just add a third
node.  Once there are three nodes, you always have a majority, which
takes precedence over the lowest number.

> What is the node with the lowest number? does it have to be Node0? or
> does it mean connectivity to the lowest surviving Node?

	Here it is specifically talking about surviving nodes; these are
the nodes visible via heartbeat.  Any node not heartbeating is
considered dead.  So if node0 is turned off, and node1 is heartbeating,
node1 is considered the lowest surviving node.

> I setup a test scenario with 4 nodes, 2 nodes mounting the filesystems
> and 2 other nodes just participating as network members:

	For the purposes of ocfs2, nodes that are not mounted are
invisible.  Only once they mount the filesystem and start heartbeating
to they take part in quorum.

> Node0 and Node1 have network connectivity and mount the filesystems
> Node3 and Node4 are alive & on the network.

	For your scenario, you essentially have a two-node quorum as
described above.  Nodes 3&4 don't participate.

> During my test (take Node0 down cold turkey)  Node1 hung pretty badly,
> is this something expected??

	What did you do to take it down?  Power off?  Node1 should take
around 90 seconds to notice (depending on your heartbeat timeout
settings), and then it should start recovery.

Joel

-- 

"Too much walking shoes worn thin.
 Too much trippin' and my soul's worn thin.
 Time to catch a ride it leaves today
 Her name is what it means.
 Too much walking shoes worn thin."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127



More information about the Ocfs2-users mailing list