[Ocfs2-users] Re: FW: Use of OCFS2 file systems.

Galan Merchan, Martin martin.galan at t-systems.es
Wed Oct 4 00:37:10 PDT 2006


Hello,



            I'm working with OCFS2 on Radhat Advanced Server 4 Patch 3 and I had kernel panics too. I use OCFS2 only for RAC archive logs and RMAN backups.



            Well, I'm testing one solution and seems to be fine:



In /etc/ocfs2/cluster.conf I have replaced the public IPs by the heartbeat IPs (parameter ip_address), but keeping the names.



            Is there anyone that knows this solution and have tested it with fails?



Regards from Spain,

MARTÍN



-----Mensaje original-----
De: ocfs2-users-bounces at oss.oracle.com [mailto:ocfs2-users-bounces at oss.oracle.com] En nombre de Alexei_Roudnev
Enviado el: miércoles, 04 de octubre de 2006 0:49
Para: Sunil Mushran; ocfs2-users
Asunto: Re: [Ocfs2-users] Re: FW: Use of OCFS2 file systems.



Unfortunately, it MAKES CLUSTER LESS STABLE. It works until network and SAN

systems afe fine, but is not so good in failed situations.



Even if we use OCFSv2 for idle file systems (which do nothing 90% of the

time) , o2cb reboots nodes when lost heartbeat

or (worst) network or (even worst) both... Instead of trying to recover

without it (as I said 0- FS is in consistant state,

no activity at all).



It is not just OCFSv2 problem - Oracle CSS behave simular (butis much more

stable in reality), and Linux HA cluster

too (but it can use different heartbeat conenctions so it can be configured

very reliable).



You are right saying that _cluster software always have a tendency to fence

or kill neighbours to keep

internal consistancy_. But OCFSv2 is one of he worst examples of such

software.



What can be done _relatively easy_.



(1) as we saiud many times - redundancy and better timeout control in

heartbeat. (Of course, long timeouts means _long recovery_, but it's OK for

90%

installations). Typical network recovery is 1 minute, not 10 seconds.



(2) System should not make bad things IF it is in consistant state. In many

cases, if system have not outstanding IO requests, it can recover

without server reboot (or at least try to do it) even if it lost heartbeats

and suspect, that other systems could take control out of it.

It is serious theoretical challenge _how to do it safely_, but it is very

desired for such systems.



(3) In some configurations, FS can be treated as _not so important_. It

means that it is safer to switch into red_only and try to recover online,

but not panic. Good example - you have production Oracle which uses ASM, and

you use OCFSv2 for backup storage. IT is safer to make IOP failure on this

storage vs rebooting system without reasons.



PS. I had 2 network outages in the lab today,m because of bad UPS - and in

all cases, ALL OCFSv2 servers (in 2 different clusters) rebooted. No one

survived short (30 seconds) lost of Ethernet conenction (including iSCSI).

In some cases, one server rebooted by OCFS and otehr by another part of the

cluster (HA or RAC) - but result is exactly this - _all_ OCFSv2 panic on a

shport network/san outage, in all cases.









----- Original Message -----

From: "Sunil Mushran" <Sunil.Mushran at oracle.com>

To: "ocfs2-users" <ocfs2-users at oss.oracle.com>

Sent: Tuesday, October 03, 2006 1:51 PM

Subject: [Ocfs2-users] Re: FW: Use of OCFS2 file systems.





> I try to avoid responding to such emails because I am not sure how

> much credibility a partisan has in such debates. After all I have been

> working on OCFS/OCFS2 the last 4/5 years.

>

> Having said that, I have some issues with the statements. While it is true

> that we can improve on the disk/net heartbeat, it is wrong to say that it

> does not work or makes the cluster unstable.

>

> We have OCFS2 running on lots of clusters in Oracle that are testing each

> new revision of the database. While these machines are test boxes, they

are

> all running loads designed to break Oracle. I am rarely pinged about them

> hitting an OCFS2 issue.

>

> We also have internal production databases as well as Oracle customers who

> are using OCFS2 with much success.

>

> However, we do have room for improvement and we are working on it.

>

> For the list of ongoing projects, you can peruse the OCFS2 Development

> Wiki at http://oss.oracle.com/osswiki/OCFS2.

>

> If you wish to contribute code, as this is an open source project, feel

free

> to ping me or the ocfs2-devel at oss.oracle.com mailing list.

>

> Thanks

> Sunil Mushran

>

> >

> > Hi Sunial,

> >

> > What are your thoughts about this message on the mailing lists?

> >

> > Thanks!

> > Sanjeet

> >

> >

> > ------------------------------------------------------------------------

> >

> > *From:* ocfs2-users-bounces at oss.oracle.com

> > [mailto:ocfs2-users-bounces at oss.oracle.com] *On Behalf Of

*Alexei_Roudnev

> > *Sent:* Friday, September 29, 2006 11:50 PM

> > *To:* Bill Wells; Sunil Mushran

> > *Cc:* ocfs2-users at oss.oracle.com

> > *Subject:* Re: [Ocfs2-users] Use of OCFS2 file systems.

> >

> >

> >

> > If you can avoid OCFSv2 on a RAC server, better do it. Any cluster

> > (RAC and OCFS) have it's own instability elements (OCFSv2 have a poor

> > heartbeat alghoritm and so tend to self-fence without real failure,

> > and (in addition) is relatively new. It works fine enough to be used,

> > when you really need file sharing (such as database files or backups

> > or even archive logs), but the less you use it, the better. Oracle

> > home files feels well without sharing.

> >

> >

> >

> > // I don't see problems with OCFSv2 on SLES9 SP3-updated, but I avoid

> > to use it for mission critical file systems or heavy-duty file systems,

> >

> > // and I still have failure scenario, when RAC cluster could work but

> > OCFS cause full-cluster failure

> >

> > // If you have network problem, SAN

> >

> > // system restart, disk io error, etc etc - you can end up with system

> > panic or reboot, caused by OCFS -

> >

> > // so the less OCFS you have, the better is your system stability.

> >

>

> _______________________________________________

> Ocfs2-users mailing list

> Ocfs2-users at oss.oracle.com

> http://oss.oracle.com/mailman/listinfo/ocfs2-users

>





_______________________________________________

Ocfs2-users mailing list

Ocfs2-users at oss.oracle.com

http://oss.oracle.com/mailman/listinfo/ocfs2-users



This e-mail may contain confidential or privileged information. Any unauthorised
copying, use or distribution of this information is strictly prohibited.

Este mensaje electrónico puede contener información confidencial o privilegiada, por lo
que está completamente prohibida la copia, el uso o la distribución no autorizada de
dicha información

Aquest missatge electrònic pot contenir informació confidencial o privilegiada i està
completament prohibida qualsevol còpia, ús o distribució no autoritzada d'aquesta
informació.

Mezu honek, enpresaren jabetzapeko edo legalki babestutako isilpeko informazioa izan dezake.
Zu ez baldin bazara hartzailea, mesedez bidaltzaileari jakinarazi iezaiozu eta mezua ezabatu,
ez ezazu gorde ezta birbidali ere, baimendu gabeko bere erabilera debekatzen da eta.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20061004/ceb5589b/attachment-0001.html


More information about the Ocfs2-users mailing list