[Ocfs-users] Hard system restart when DRBD connection fails while in use

Henri Cook ocfs at theplayboymansion.net
Sun Sep 7 17:48:40 PDT 2008


Dear Sunil,

It is OCFS2 - I found the code, it's the self-fencing mechanism that
simply reboots the node - if I alter the OCFS2 timeout, the reboot is
delayed by that many seconds. It's a real shame, i'm going to have to
try to work with it - probably by extending the node timeout to 2 days
or something - with DRBD I don't see the need for OCFS2 to be rebooting
or anything really as DRBD takes care of block device synchronisation -
I just wish this behaviour was configureable!

Henri

Sunil Mushran wrote:
> Repeat the test. This time run the following on Node A
> after you have killed Node B.
>
> $ ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN
>
> If we are lucky we'll get to see where that process is waiting.
>
> Henri Cook wrote:
>> Hi all,
>>
>> I have two nodes (A+B) running a DRBD file system (using OCFS2) on
>> /shared.
>>
>> If I start say, an FTP file transfer to my drbd /shared directory on
>> node A, then reboot node B which is the other machine in a
>> Primary-Primary DRBD configuration while the transfer is in progress
>> - node A stops at a similar time that DRBD notices the connection
>> with Node B has been lost (hence crippling both machines for the time
>> it takes to reboot). If the drive is inactive (i.e. nothing is being
>> written to it) then this does not occur.
>>
>> My question then is, could OCFS2 tools be the source of these
>> reboots, is there any such default action configured? If so, how
>> would I go about investigating/altering it?  There are no log entries
>> about the reboot to speak of.
>>
>> OS is Ubuntu Hardy (Server) 8.04 and ocfs2-tools 1.3.9-0ubuntu1
>>
>> Thanks in advance,
>>
>> Henri
>>
>>
>> _______________________________________________
>> Ocfs-users mailing list
>> Ocfs-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs-users
>>   
>



More information about the Ocfs-users mailing list