[Ocfs2-users] Failover testing problem and a heartbeat question

Daniel McDonald wasade at gmail.com
Wed May 26 12:53:46 PDT 2010


We have a setup with 15 hosts fibre attached via a switch to a common SAN. Each host has a single fibre port, the SAN has two controllers each with two ports. The SAN is exposing four OCFS2 v1.4.2 volumes. While performing a failover test, we observed 8 hosts fence and 2 reboot _without_ fencing. The OCFS2 FAQ recommends a default disk heartbeat of 31 - 61 loops for multipath io users. Our initial thought was to increase the default from 31 to 61. 

I have a two hopefully simple questions. First, is there any reason why we would not want to increase the threshold to 61? Performance or otherwise?

Second, is there any reason in which, during IO operations and experiencing a single fibre path (out of 4) failure, an OCFS2 node would reset itself without _any_ kernel log message?

Thank you for your time
-Daniel


More information about the Ocfs2-users mailing list