[Ocfs2-users] Re: FW: Use of OCFS2 file systems.

Alexei_Roudnev Alexei_Roudnev at exigengroup.com
Wed Oct 4 10:43:58 PDT 2006


It's possible option - I even built cluster at home (taking FC disk system
from a garbage bin -:)) but did not had enough time to proceed with it yet.


----- Original Message ----- 
From: "Sunil Mushran" <sunil.mushran at oracle.com>
To: "Alexei_Roudnev" <Alexei_Roudnev at exigengroup.com>
Cc: "ocfs2-users" <ocfs2-users at oss.oracle.com>
Sent: Wednesday, October 04, 2006 10:08 AM
Subject: Re: [Ocfs2-users] Re: FW: Use of OCFS2 file systems.


> Feel free to contribute patches.
>
> Sunil
>
> Alexei_Roudnev wrote:
>
> >Unfortunately, it MAKES CLUSTER LESS STABLE. It works until network and
SAN
> >systems afe fine, but is not so good in failed situations.
> >
> >Even if we use OCFSv2 for idle file systems (which do nothing 90% of the
> >time) , o2cb reboots nodes when lost heartbeat
> >or (worst) network or (even worst) both... Instead of trying to recover
> >without it (as I said 0- FS is in consistant state,
> >no activity at all).
> >
> >It is not just OCFSv2 problem - Oracle CSS behave simular (butis much
more
> >stable in reality), and Linux HA cluster
> >too (but it can use different heartbeat conenctions so it can be
configured
> >very reliable).
> >
> >You are right saying that _cluster software always have a tendency to
fence
> >or kill neighbours to keep
> >internal consistancy_. But OCFSv2 is one of he worst examples of such
> >software.
> >
> >What can be done _relatively easy_.
> >
> >(1) as we saiud many times - redundancy and better timeout control in
> >heartbeat. (Of course, long timeouts means _long recovery_, but it's OK
for
> >90%
> >installations). Typical network recovery is 1 minute, not 10 seconds.
> >
> >(2) System should not make bad things IF it is in consistant state. In
many
> >cases, if system have not outstanding IO requests, it can recover
> >without server reboot (or at least try to do it) even if it lost
heartbeats
> >and suspect, that other systems could take control out of it.
> >It is serious theoretical challenge _how to do it safely_, but it is very
> >desired for such systems.
> >
> >(3) In some configurations, FS can be treated as _not so important_. It
> >means that it is safer to switch into red_only and try to recover online,
> >but not panic. Good example - you have production Oracle which uses ASM,
and
> >you use OCFSv2 for backup storage. IT is safer to make IOP failure on
this
> >storage vs rebooting system without reasons.
> >
> >PS. I had 2 network outages in the lab today,m because of bad UPS - and
in
> >all cases, ALL OCFSv2 servers (in 2 different clusters) rebooted. No one
> >survived short (30 seconds) lost of Ethernet conenction (including
iSCSI).
> >In some cases, one server rebooted by OCFS and otehr by another part of
the
> >cluster (HA or RAC) - but result is exactly this - _all_ OCFSv2 panic on
a
> >shport network/san outage, in all cases.
> >
> >
> >
> >
> >----- Original Message ----- 
> >From: "Sunil Mushran" <Sunil.Mushran at oracle.com>
> >To: "ocfs2-users" <ocfs2-users at oss.oracle.com>
> >Sent: Tuesday, October 03, 2006 1:51 PM
> >Subject: [Ocfs2-users] Re: FW: Use of OCFS2 file systems.
> >
> >
> >
> >
> >>I try to avoid responding to such emails because I am not sure how
> >>much credibility a partisan has in such debates. After all I have been
> >>working on OCFS/OCFS2 the last 4/5 years.
> >>
> >>Having said that, I have some issues with the statements. While it is
true
> >>that we can improve on the disk/net heartbeat, it is wrong to say that
it
> >>does not work or makes the cluster unstable.
> >>
> >>We have OCFS2 running on lots of clusters in Oracle that are testing
each
> >>new revision of the database. While these machines are test boxes, they
> >>
> >>
> >are
> >
> >
> >>all running loads designed to break Oracle. I am rarely pinged about
them
> >>hitting an OCFS2 issue.
> >>
> >>We also have internal production databases as well as Oracle customers
who
> >>are using OCFS2 with much success.
> >>
> >>However, we do have room for improvement and we are working on it.
> >>
> >>For the list of ongoing projects, you can peruse the OCFS2 Development
> >>Wiki at http://oss.oracle.com/osswiki/OCFS2.
> >>
> >>If you wish to contribute code, as this is an open source project, feel
> >>
> >>
> >free
> >
> >
> >>to ping me or the ocfs2-devel at oss.oracle.com mailing list.
> >>
> >>Thanks
> >>Sunil Mushran
> >>
> >>
> >>
> >>>Hi Sunial,
> >>>
> >>>What are your thoughts about this message on the mailing lists?
> >>>
> >>>Thanks!
> >>>Sanjeet
> >>>
> >>>
>
>>>------------------------------------------------------------------------
> >>>
> >>>*From:* ocfs2-users-bounces at oss.oracle.com
> >>>[mailto:ocfs2-users-bounces at oss.oracle.com] *On Behalf Of
> >>>
> >>>
> >*Alexei_Roudnev
> >
> >
> >>>*Sent:* Friday, September 29, 2006 11:50 PM
> >>>*To:* Bill Wells; Sunil Mushran
> >>>*Cc:* ocfs2-users at oss.oracle.com
> >>>*Subject:* Re: [Ocfs2-users] Use of OCFS2 file systems.
> >>>
> >>>
> >>>
> >>>If you can avoid OCFSv2 on a RAC server, better do it. Any cluster
> >>>(RAC and OCFS) have it's own instability elements (OCFSv2 have a poor
> >>>heartbeat alghoritm and so tend to self-fence without real failure,
> >>>and (in addition) is relatively new. It works fine enough to be used,
> >>>when you really need file sharing (such as database files or backups
> >>>or even archive logs), but the less you use it, the better. Oracle
> >>>home files feels well without sharing.
> >>>
> >>>
> >>>
> >>>// I don't see problems with OCFSv2 on SLES9 SP3-updated, but I avoid
> >>>to use it for mission critical file systems or heavy-duty file systems,
> >>>
> >>>// and I still have failure scenario, when RAC cluster could work but
> >>>OCFS cause full-cluster failure
> >>>
> >>>// If you have network problem, SAN
> >>>
> >>>// system restart, disk io error, etc etc - you can end up with system
> >>>panic or reboot, caused by OCFS -
> >>>
> >>>// so the less OCFS you have, the better is your system stability.
> >>>
> >>>
> >>>
> >>_______________________________________________
> >>Ocfs2-users mailing list
> >>Ocfs2-users at oss.oracle.com
> >>http://oss.oracle.com/mailman/listinfo/ocfs2-users
> >>
> >>
> >>
> >
> >
> >
>
>




More information about the Ocfs2-users mailing list