[Ocfs2-users] I have questions regarding Fencing

Joel Becker Joel.Becker at oracle.com
Tue Mar 2 15:05:37 PST 2010


On Tue, Mar 02, 2010 at 05:09:21AM -0500, Enrique Sanchez wrote:
> >> During my test (take Node0 down cold turkey)  Node1 hung pretty badly,
> >> is this something expected??
> >
> >        What did you do to take it down?  Power off?  Node1 should take
> > around 90 seconds to notice (depending on your heartbeat timeout
> > settings), and then it should start recovery.
> >
> 
> I flip the power off, on almost any test Node1 crashes as well.

	I would expect node1 to survive, but perhaps I'm being too
hopeful?  Maybe someone else knows?

> I don;t understand why you don't have plans to add a referential IP
> address to find who's on the network and who isn't, while you got a

	Once you're off the disk, who is on the network is irrelevant.
The only thing that is interesting is which nodes can access the
filesystem.
	Quorum is interesting when you are still talking to the disk but
cannot be reached via network.  Quorum is deciding which group of nodes
can stay on the disk and which group must leave.  It definitely takes
into account who a node can see via the network.
	In your case above, where you powered off node0 and node1 still
crashed, I would expect node1 to notice node0 is off the disk as well as
off the network and determine that it is the only surviving node.
That's why I'm wondering if any of my colleagues have some input.

Joel

-- 

Life's Little Instruction Book #173

	"Be kinder than necessary."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127



More information about the Ocfs2-users mailing list