[Ocfs-users] Hard system restart when DRBD connection fails while in use

Sunil Mushran sunil.mushran at oracle.com
Mon Sep 8 11:05:52 PDT 2008


ok.. so i think i know what the issue was. Normally, in the test
you described, ocfs2 would not be block if another node was being
shutdown. However, the fs will block if any layer below it does.
In this case, I am assuming drbd was blocking. That's fine as long
as drbd's timeout is shorter than o2cb hb timeout. But that probably
was not the case because you were running with the original low cluster
timeouts. Increasing that to 60 secs, allowed drbd to do its detection
without triggering o2hb's timeout.

No, setting it to indefinite is not a solution. In this case you are
shutting down the system. A shutdown (or reboot) is a clean operation.
Very different from a hard reset. Hard reset forces the file system
to cleanup the dead node's journal and locks. For that it needs to
detect node death. If the timeout is set to indefinite, it will wait
indefinitely to mark that node as dead. That's not what you want.

Stick to the default value or something in that ballpark.

Henri Cook wrote:
> Cool - I really appreciate all your help. Assuming I can't change this
> crash behaviour - can I just extend the timeout indefinitely? I haven't
> seen any evidence of the hang using a two minute timeout and recreating
> this crash - the shared filesystem on node A can still be written to
> etc. and obviously they resync on connection via DRBD
>
> Sunil Mushran wrote:
>   
>> 60 secs is the current default for hb timeout. It's been like that for
>> a long time now.
>>
>> Henri Cook wrote:
>>     
>>> So my timeout was 7 seconds before which means my node A shutdowns very
>>> quickly after node B - it's now 30 seconds so after i've shutdown B,
>>> once A's noticed that node B's gone - that's when i'd run that command?
>>> e.g. within the 30 second timeout?
>>>
>>> It's interesting to note that if I simply reboot node B with a long
>>> timeout (e.g. 30 seconds) when it comes back normal operation resumes -
>>> which is what led me to believe we could extend this to a couple of days
>>> or more.
>>>
>>> Sunil Mushran wrote:
>>>  
>>>       
>>>> What's the ps output?
>>>>
>>>> My suspicion is that drbd is blocking the ios including the
>>>> disk hb io leading to the fence.
>>>>
>>>> Henri Cook wrote:
>>>>    
>>>>         
>>>>> I realise the timeout is configurable - how will the cluster hang for
>>>>> two days? I don't understand.
>>>>>
>>>>> If one node in the (2 node) cluster dies - the other one should
>>>>> just be
>>>>> able to continue surely? When the other node comes back its shared
>>>>> block
>>>>> device (ocfs2 drive) will be overwritten with the contents of the
>>>>> active
>>>>> host by DRBD
>>>>>
>>>>> Sunil Mushran wrote:
>>>>>  
>>>>>      
>>>>>           
>>>>>> The fencing mechanism is meant to avoid disk corruptions. If you
>>>>>> extend the
>>>>>> disk heartbeat to 2 days, then if a node dies, the cluster will hang
>>>>>> for 2 days.
>>>>>> The timeout is configurable. Details are in the 1.2 FAQ and 1.4
>>>>>> user's
>>>>>> guide.
>>>>>>
>>>>>> Henri Cook wrote:
>>>>>>           
>>>>>>             
>>>>>>> Dear Sunil,
>>>>>>>
>>>>>>> It is OCFS2 - I found the code, it's the self-fencing mechanism that
>>>>>>> simply reboots the node - if I alter the OCFS2 timeout, the
>>>>>>> reboot is
>>>>>>> delayed by that many seconds. It's a real shame, i'm going to
>>>>>>> have to
>>>>>>> try to work with it - probably by extending the node timeout to 2
>>>>>>> days
>>>>>>> or something - with DRBD I don't see the need for OCFS2 to be
>>>>>>> rebooting
>>>>>>> or anything really as DRBD takes care of block device
>>>>>>> synchronisation -
>>>>>>> I just wish this behaviour was configureable!
>>>>>>>
>>>>>>> Henri
>>>>>>>
>>>>>>> Sunil Mushran wrote:
>>>>>>>  
>>>>>>>               
>>>>>>>               
>>>>>>>> Repeat the test. This time run the following on Node A
>>>>>>>> after you have killed Node B.
>>>>>>>>
>>>>>>>> $ ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN
>>>>>>>>
>>>>>>>> If we are lucky we'll get to see where that process is waiting.
>>>>>>>>
>>>>>>>> Henri Cook wrote:
>>>>>>>>                      
>>>>>>>>                 
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> I have two nodes (A+B) running a DRBD file system (using OCFS2) on
>>>>>>>>> /shared.
>>>>>>>>>
>>>>>>>>> If I start say, an FTP file transfer to my drbd /shared
>>>>>>>>> directory on
>>>>>>>>> node A, then reboot node B which is the other machine in a
>>>>>>>>> Primary-Primary DRBD configuration while the transfer is in
>>>>>>>>> progress
>>>>>>>>> - node A stops at a similar time that DRBD notices the connection
>>>>>>>>> with Node B has been lost (hence crippling both machines for the
>>>>>>>>> time
>>>>>>>>> it takes to reboot). If the drive is inactive (i.e. nothing is
>>>>>>>>> being
>>>>>>>>> written to it) then this does not occur.
>>>>>>>>>
>>>>>>>>> My question then is, could OCFS2 tools be the source of these
>>>>>>>>> reboots, is there any such default action configured? If so, how
>>>>>>>>> would I go about investigating/altering it?  There are no log
>>>>>>>>> entries
>>>>>>>>> about the reboot to speak of.
>>>>>>>>>
>>>>>>>>> OS is Ubuntu Hardy (Server) 8.04 and ocfs2-tools 1.3.9-0ubuntu1
>>>>>>>>>
>>>>>>>>> Thanks in advance,
>>>>>>>>>
>>>>>>>>> Henri
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Ocfs-users mailing list
>>>>>>>>> Ocfs-users at oss.oracle.com
>>>>>>>>> http://oss.oracle.com/mailman/listinfo/ocfs-users
>>>>>>>>>                                 
>>>>>>>>>                   




More information about the Ocfs-users mailing list