[Ocfs2-users] I have questions regarding Fencing
Enrique Sanchez
esanchezvela.redhatcluster at gmail.com
Tue Mar 2 02:09:21 PST 2010
On Mon, Mar 1, 2010 at 9:40 PM, Joel Becker <Joel.Becker at oracle.com> wrote:
>
> Two nodes is a special and difficult case. If node0 is still
> heartbeating, node1 thinks it is alive; by the lowest number rule, node1
> resets. If node0 is not heartbeating (a full crash), node1 will stay
> alive. As long as node0 is heartbeating, there is no way for node1 to
> know that node0 is having trouble.
> If this case presents a significant problem, just add a third
> node. Once there are three nodes, you always have a majority, which
> takes precedence over the lowest number.
>
>> What is the node with the lowest number? does it have to be Node0? or
>> does it mean connectivity to the lowest surviving Node?
>
> Here it is specifically talking about surviving nodes; these are
> the nodes visible via heartbeat. Any node not heartbeating is
> considered dead. So if node0 is turned off, and node1 is heartbeating,
> node1 is considered the lowest surviving node.
>
>> I setup a test scenario with 4 nodes, 2 nodes mounting the filesystems
>> and 2 other nodes just participating as network members:
>
> For the purposes of ocfs2, nodes that are not mounted are
> invisible. Only once they mount the filesystem and start heartbeating
> to they take part in quorum.
>
>
> For your scenario, you essentially have a two-node quorum as
> described above. Nodes 3&4 don't participate.
Then I believe the Quorum rules in the documentation/FAQ should be
updated with this info.
>
>> During my test (take Node0 down cold turkey) Node1 hung pretty badly,
>> is this something expected??
>
> What did you do to take it down? Power off? Node1 should take
> around 90 seconds to notice (depending on your heartbeat timeout
> settings), and then it should start recovery.
>
I flip the power off, on almost any test Node1 crashes as well.
I don;t understand why you don't have plans to add a referential IP
address to find who's on the network and who isn't, while you got a
point in adding a third node won't break the bank if we're using
RAC/SAP already unless we're required to get a license for that node
anyway, running a node in idle mode seems a little bit wasteful, but
if that solves the problem..... good I'll give it a shoot today.
thanks,
esv.
More information about the Ocfs2-users
mailing list