[Ocfs2-users] Catatonic nodes under SLES10

Eckenfels. Bernd B.Eckenfels at seeburger.de
Tue Apr 10 10:33:01 PDT 2007


> 	By the time you determine you need a node to fence, you do not
know what I/O it has in its pipeline.  Any I/O that > is below the
request queue can't reliably be stopped in Linux.  If that I/O goes out
after other nodes have decided the > node is gone, it is corruption.
This is why fencing has to be absolute.

Fencing does not help here, you have a race condition, and the I/O queue
is only the smallest part of the path where the IO request might get
stuck. The panic wont stop BH interrupts, nor will it stop the
controllers and switches. In fact the other node can much better "pull
the plug" with a SCSI reservation. And especially the panic might not
work in the worst state, if the node is borken.

There are essentiallly multiple problems:

A) the node is well behaving, just loses connection to part of the
storage - no fencing is needed, you just dont issue more requests and
remeber the new node state, the other nodes might recognize
recovery/replay is needed. Nobody (especially not the fencing node) can
know which part of the last IO transaction will reach the device (or
not) anyway. Thats why you have to have IO transactions with atomic
changes and integrity. Also there is no need to dequeue the last IO
request for exactly that reason - it must not be harmfull anyway.

B) the node is totally borken (memory overwrite, interrupt deadlocks,
etc). In that case the note might not notice the failure, or it might
not be able to self-fence. This is a typical case for STONIT.

C) the node is well just the heartbeat is delayed, or the overall nodes
are well and the cache is in sync, only a single disk storage fails.
Those cases should never occur (larger timeouts are only part of the
solution, a smarter quorum algorithm like provided with heartbeat or
other cluster managers is needed). That case was happenign quite often,
I guess increased timeouts make this a bit better, but cant be reliable
solved with a cluster framework which considers more thant the state of
of a single network connection.

One of the Problems OCFSv2 has is, that the condition c) happened too
often and that the condition a) is not recognizeable. And of course that
all those conditions are handled in the simplest way, with a panic which
is IMHO not really helpful for the above mention reasons.

Gruss
Bernd


SEEBURGER AG 
Headquarters:
Edisonstraße 1 
D-75015 Bretten 
Tel.: 0 72 52/96-0 
Fax: 0 72 52/96-2222 
Internet: http://www.seeburger.de 
e-mail: info at seeburger.de 

Vorstand:
Bernd Seeburger, Axel Haas, Michael Kleeberg

Vorsitzender des Aufsichtsrats:
Dr. Franz Scherer

Handelsregister:
HRB 240708 Mannheim



More information about the Ocfs2-users mailing list