[Ocfs2-users] problem with 2 host cluster - abiout network heartbeat again -:)

Alexei_Roudnev Alexei_Roudnev at exigengroup.com
Mon Sep 18 10:13:11 PDT 2006


This problems repeats on different installations again and again (many
people reports it).

What I wounder is - why, if it is SO SIMPLE to allow O2CB to have a few
heartbeat channels (at least few IP addresses), and
it is so safe to do it (such change is 100% safe, it is not thing like _dont
self fence if FS is in consistant state and we have not
outstanding IO, ecven if we lost all connections_) - why OCFS team did not
made such improvement during few years of development?

It is 100% obvious, and 100% proven by many different clusters, that
clustered system MUST (not SHOULD) have few heartbeat channels (and dont
rely on L2 leyer such as interface bonding). In most cases, such change is
100% safe and simple. I can understand Oracle, which have heartbeat channel
merged with data exchanged channel  - in this case, using few channels can
cause data to come out of order (but - Oracle support multi interfaces for
css syncronization, at least in theory). OC2b heartbeat have not this
problem, at least for heartbeat. So, why this simple thing takes so may time
to be even proposed?

I can bet, that one of 2 things will happen in next, say, 2 years:
- OCFSv2 wil have multy interface feature
OR
- OCFS2 will die as a product (to be correct, O2CB, becasue OCFSv2 can work
without it).

PS. TCP/IP at ethernet  have 30 - 45 seconds convergence time, which means that
in any moment, you can have 30 - 45 seconds service interruption. Most
TCP/IP protocols, even  iSCSI, are well adjusted for this. OCFSv2 have this
problem from the very first days. For now, this problem became one of OCFSv2
stoppers (if not killers). There is a very simple fix (multyinterface).
Conclusion?








More information about the Ocfs2-users mailing list