[Ocfs2-users] Catatonic nodes under SLES10

Alexei_Roudnev Alexei_Roudnev at exigengroup.com
Tue Apr 10 14:35:28 PDT 2007


Even now, there is a simple way to set up reliable heartbeat (but  it's a
hackers way and you must undestand TCP/IP routing protocols):

- set up additional loopback interface,
- run OSPF router;
- set up short OSPF hello and idle timeouts (there are 3 of them)
- add all interfaces into it including loopback

Now configure loopback in OCFSv2 - OSPF wil help with the roiting (if eth0
fail ospf wil reroute loopback thru eth1 in 5 - 10 seconds,
because it is link state protocol with almost 0 waiting time). I always
guess if someone can use the same method with Oracle cluster.

The same with o2cb - I am sure that simple 'round robin try' schema can be
implemented for o2cb, at least with fast 'if link failed, try next one'.
It have it's own problems (so it should be driven by heartbeat timeout and
not by TCP events such as lost of connections - timeouts are too long even
if you are active and sent something) but it should not be dto difficult to
implement.


----- Original Message ----- 
From: "Jeff Mahoney" <jeffm at suse.com>
To: "Alexei_Roudnev" <Alexei_Roudnev at exigengroup.com>
Cc: "Luis Freitas" <lfreitas34 at yahoo.com>; "Sunil Mushran"
<Sunil.Mushran at oracle.com>; <ocfs2-users at oss.oracle.com>
Sent: Tuesday, April 10, 2007 1:21 PM
Subject: Re: [Ocfs2-users] Catatonic nodes under SLES10


> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Alexei_Roudnev wrote:
> > Absolutely. 100% agree! I think that the only sugnificant (and easy,
safe)
> > improvement for o2cb which should be done is _multiple IP for heartbeat_
and
> > _configurable timeout_.
>
> Configurable timeout is in the upstream kernel already. It's also part
> of SLES10 SP1.
>
> I think multiple IPs would be more difficult. The infrastructure is
> really only set up for one IP at a time. It wouldn't be just for
> heartbeat. Handling keepalive messages from multiple ip addresses would
> be fairly trivial but entirely useless. We'll know the other node is up,
> but won't actually be able to handle DLM requests. I suppose it's
> possible to handle this with sort of a master-fallback scheme, but there
> are quite a list of features that are higher priority right now.
>
> - -Jeff
>
> - --
> Jeff Mahoney
> SUSE Labs
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.5 (GNU/Linux)
> Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org
>
> iD8DBQFGG/GxLPWxlyuTD7IRAiaTAJ9/8ekq34DO+jIMlV5wUDDUDpNxtwCfTEBN
> rOmoyMR8eYehyom/a9fjfts=
> =5JQP
> -----END PGP SIGNATURE-----
>




More information about the Ocfs2-users mailing list