[Ocfs2-users] Catatonic nodes under SLES10

Jeff Mahoney jeffm at suse.com
Tue Apr 10 12:10:57 PDT 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Alexei_Roudnev wrote:
> Luis.
>  
> Things can be worst because we can run 3 clusterware at the same time on
> the same Linux:
>  
> - CRS (oracle RAC)
> - O2CB
> - Heartbeat2
>  
> Problem is that each system makes independent decisions and independent
> selection of the masters and slaves, and decide _to fence _ or _to
> suicide_ independently.
>  
> It makes a common case, when, if we have a SAN service interruption or
> IP network interruption (for a short time), different components makes a
> different decisions and fence themself or each other (btw, in case of
> CRS, fencing is a feature of CSS and not a CRS).
>  
> Of these 3 clusterwares, only heartbeat (or heartbeat2) is reliable.
> Both o2cb and CRS uses a very primitive heartbeat without redundancy and
> with bad initial parameters, and both makes a wrong decisions easily.

Andrew will be pleased to hear your opinion on heartbeat2. You might be
pleased to know that he's becoming more involved with OCFS2 development.
We all know that there are shortcomings with the o2cb heartbeat method,
and it wouldn't be that hard to extend it to do multiple heartbeats. The
thing is, expanding o2cb isn't really something we should work on long-term.

o2cb isn't really well suited for your environment, as you've
discovered. However, your environment isn't really where o2cb is
targeted. Personally, I think it's a huge ease-of-use win for someone
who just wants to set up a tandem web server (for example) with ocfs2
mounted between the servers. o2cb is extremely easy to set up, so going
from 0 to 100% is literally a 3 minute operation.

So, I think we should keep o2cb around for the truly simple deployments.
We don't really need to expand it much, it's already suited for that.
Long term development, AFAIK, will focus more on getting the userspace
cluster interface more solid. Like any kernel<->userspace interaction
involving file systems, it could find itself prone to memory deadlocks
if the system is very low on memory and swaps out the cluster manager.
That's something that definitely needs to be worked on, and I'm working
with Andrew to ensure that happens in the future.

- -Jeff
- --
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org

iD8DBQFGG+FBLPWxlyuTD7IRApGIAJ4oWjGTqs33qlxAafDH5AbUts1A4QCfXx9n
iiLTUC2hUppsz76y1Fg+50M=
=RPZ0
-----END PGP SIGNATURE-----



More information about the Ocfs2-users mailing list