[Ocfs-users] Hard system restart when DRBD connection fails while in use

Mon Sep 8 11:35:13 PDT 2008

Many thanks Sunil, this is really helpful. Also your response to the bug
report.

Another thought just occured to me - the default start/stop settings in
Ubuntu have DRBD stopping before OCFS2 and O2CB - obviously this is
probably like removing /dev/drbd0 where OCFS2 is mounted before
unmounted it (acting like a crash) - so this could also have
contributed/been the actual cause all along.

If anyone else is reading this with a similar problem try checking
/etc/rc3.d and /etc/rc6.d and making sure DRBD stops *after* OCFS2 and
O2CB (it's based on the ls -l order which I didn't know before).

Again Sunil, you've been amazing - many thanks for your help solving
this issue, i'm a much happier person now it's sorted! I am still a
little concerned that if one machine crashes the other one will reboot
after <time out> - as this does cripple a cluster, not what i'd call
high availability BUT from what I gather on IRC this has been approached
with new tools in SLES 10 so happy days :)

Henri

Sunil Mushran wrote:
> ok.. so i think i know what the issue was. Normally, in the test
> you described, ocfs2 would not be block if another node was being
> shutdown. However, the fs will block if any layer below it does.
> In this case, I am assuming drbd was blocking. That's fine as long
> as drbd's timeout is shorter than o2cb hb timeout. But that probably
> was not the case because you were running with the original low cluster
> timeouts. Increasing that to 60 secs, allowed drbd to do its detection
> without triggering o2hb's timeout.
>
> No, setting it to indefinite is not a solution. In this case you are
> shutting down the system. A shutdown (or reboot) is a clean operation.
> Very different from a hard reset. Hard reset forces the file system
> to cleanup the dead node's journal and locks. For that it needs to
> detect node death. If the timeout is set to indefinite, it will wait
> indefinitely to mark that node as dead. That's not what you want.
>
> Stick to the default value or something in that ballpark.
>
> Henri Cook wrote:
>> Cool - I really appreciate all your help. Assuming I can't change this
>> crash behaviour - can I just extend the timeout indefinitely? I haven't
>> seen any evidence of the hang using a two minute timeout and recreating
>> this crash - the shared filesystem on node A can still be written to
>> etc. and obviously they resync on connection via DRBD
>>
>> Sunil Mushran wrote:
>>  
>>> 60 secs is the current default for hb timeout. It's been like that for
>>> a long time now.
>>>
>>> Henri Cook wrote:
>>>    
>>>> So my timeout was 7 seconds before which means my node A shutdowns
>>>> very
>>>> quickly after node B - it's now 30 seconds so after i've shutdown B,
>>>> once A's noticed that node B's gone - that's when i'd run that
>>>> command?
>>>> e.g. within the 30 second timeout?
>>>>
>>>> It's interesting to note that if I simply reboot node B with a long
>>>> timeout (e.g. 30 seconds) when it comes back normal operation
>>>> resumes -
>>>> which is what led me to believe we could extend this to a couple of
>>>> days
>>>> or more.
>>>>
>>>> Sunil Mushran wrote:
>>>>  
>>>>      
>>>>> What's the ps output?
>>>>>
>>>>> My suspicion is that drbd is blocking the ios including the
>>>>> disk hb io leading to the fence.
>>>>>
>>>>> Henri Cook wrote:
>>>>>           
>>>>>> I realise the timeout is configurable - how will the cluster hang
>>>>>> for
>>>>>> two days? I don't understand.
>>>>>>
>>>>>> If one node in the (2 node) cluster dies - the other one should
>>>>>> just be
>>>>>> able to continue surely? When the other node comes back its shared
>>>>>> block
>>>>>> device (ocfs2 drive) will be overwritten with the contents of the
>>>>>> active
>>>>>> host by DRBD
>>>>>>
>>>>>> Sunil Mushran wrote:
>>>>>>  
>>>>>>               
>>>>>>> The fencing mechanism is meant to avoid disk corruptions. If you
>>>>>>> extend the
>>>>>>> disk heartbeat to 2 days, then if a node dies, the cluster will
>>>>>>> hang
>>>>>>> for 2 days.
>>>>>>> The timeout is configurable. Details are in the 1.2 FAQ and 1.4
>>>>>>> user's
>>>>>>> guide.
>>>>>>>
>>>>>>> Henri Cook wrote:
>>>>>>>                      
>>>>>>>> Dear Sunil,
>>>>>>>>
>>>>>>>> It is OCFS2 - I found the code, it's the self-fencing mechanism
>>>>>>>> that
>>>>>>>> simply reboots the node - if I alter the OCFS2 timeout, the
>>>>>>>> reboot is
>>>>>>>> delayed by that many seconds. It's a real shame, i'm going to
>>>>>>>> have to
>>>>>>>> try to work with it - probably by extending the node timeout to 2
>>>>>>>> days
>>>>>>>> or something - with DRBD I don't see the need for OCFS2 to be
>>>>>>>> rebooting
>>>>>>>> or anything really as DRBD takes care of block device
>>>>>>>> synchronisation -
>>>>>>>> I just wish this behaviour was configureable!
>>>>>>>>
>>>>>>>> Henri
>>>>>>>>
>>>>>>>> Sunil Mushran wrote:
>>>>>>>>  
>>>>>>>>                            
>>>>>>>>> Repeat the test. This time run the following on Node A
>>>>>>>>> after you have killed Node B.
>>>>>>>>>
>>>>>>>>> $ ps -e -o pid,stat,comm,wchan=WIDE-WCHAN-COLUMN
>>>>>>>>>
>>>>>>>>> If we are lucky we'll get to see where that process is waiting.
>>>>>>>>>
>>>>>>>>> Henri Cook wrote:
>>>>>>>>>                                     
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> I have two nodes (A+B) running a DRBD file system (using
>>>>>>>>>> OCFS2) on
>>>>>>>>>> /shared.
>>>>>>>>>>
>>>>>>>>>> If I start say, an FTP file transfer to my drbd /shared
>>>>>>>>>> directory on
>>>>>>>>>> node A, then reboot node B which is the other machine in a
>>>>>>>>>> Primary-Primary DRBD configuration while the transfer is in
>>>>>>>>>> progress
>>>>>>>>>> - node A stops at a similar time that DRBD notices the
>>>>>>>>>> connection
>>>>>>>>>> with Node B has been lost (hence crippling both machines for the
>>>>>>>>>> time
>>>>>>>>>> it takes to reboot). If the drive is inactive (i.e. nothing is
>>>>>>>>>> being
>>>>>>>>>> written to it) then this does not occur.
>>>>>>>>>>
>>>>>>>>>> My question then is, could OCFS2 tools be the source of these
>>>>>>>>>> reboots, is there any such default action configured? If so, how
>>>>>>>>>> would I go about investigating/altering it?  There are no log
>>>>>>>>>> entries
>>>>>>>>>> about the reboot to speak of.
>>>>>>>>>>
>>>>>>>>>> OS is Ubuntu Hardy (Server) 8.04 and ocfs2-tools 1.3.9-0ubuntu1
>>>>>>>>>>
>>>>>>>>>> Thanks in advance,
>>>>>>>>>>
>>>>>>>>>> Henri
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Ocfs-users mailing list
>>>>>>>>>> Ocfs-users at oss.oracle.com
>>>>>>>>>> http://oss.oracle.com/mailman/listinfo/ocfs-users
>>>>>>>>>>                                                   
>