[Ocfs2-users] Re: FW: Use of OCFS2 file systems.

Sunil Mushran sunil.mushran at oracle.com
Wed Oct 4 10:08:13 PDT 2006


Feel free to contribute patches.

Sunil

Alexei_Roudnev wrote:

>Unfortunately, it MAKES CLUSTER LESS STABLE. It works until network and SAN
>systems afe fine, but is not so good in failed situations.
>
>Even if we use OCFSv2 for idle file systems (which do nothing 90% of the
>time) , o2cb reboots nodes when lost heartbeat
>or (worst) network or (even worst) both... Instead of trying to recover
>without it (as I said 0- FS is in consistant state,
>no activity at all).
>
>It is not just OCFSv2 problem - Oracle CSS behave simular (butis much more
>stable in reality), and Linux HA cluster
>too (but it can use different heartbeat conenctions so it can be configured
>very reliable).
>
>You are right saying that _cluster software always have a tendency to fence
>or kill neighbours to keep
>internal consistancy_. But OCFSv2 is one of he worst examples of such
>software.
>
>What can be done _relatively easy_.
>
>(1) as we saiud many times - redundancy and better timeout control in
>heartbeat. (Of course, long timeouts means _long recovery_, but it's OK for
>90%
>installations). Typical network recovery is 1 minute, not 10 seconds.
>
>(2) System should not make bad things IF it is in consistant state. In many
>cases, if system have not outstanding IO requests, it can recover
>without server reboot (or at least try to do it) even if it lost heartbeats
>and suspect, that other systems could take control out of it.
>It is serious theoretical challenge _how to do it safely_, but it is very
>desired for such systems.
>
>(3) In some configurations, FS can be treated as _not so important_. It
>means that it is safer to switch into red_only and try to recover online,
>but not panic. Good example - you have production Oracle which uses ASM, and
>you use OCFSv2 for backup storage. IT is safer to make IOP failure on this
>storage vs rebooting system without reasons.
>
>PS. I had 2 network outages in the lab today,m because of bad UPS - and in
>all cases, ALL OCFSv2 servers (in 2 different clusters) rebooted. No one
>survived short (30 seconds) lost of Ethernet conenction (including iSCSI).
>In some cases, one server rebooted by OCFS and otehr by another part of the
>cluster (HA or RAC) - but result is exactly this - _all_ OCFSv2 panic on a
>shport network/san outage, in all cases.
>
>
>
>
>----- Original Message ----- 
>From: "Sunil Mushran" <Sunil.Mushran at oracle.com>
>To: "ocfs2-users" <ocfs2-users at oss.oracle.com>
>Sent: Tuesday, October 03, 2006 1:51 PM
>Subject: [Ocfs2-users] Re: FW: Use of OCFS2 file systems.
>
>
>  
>
>>I try to avoid responding to such emails because I am not sure how
>>much credibility a partisan has in such debates. After all I have been
>>working on OCFS/OCFS2 the last 4/5 years.
>>
>>Having said that, I have some issues with the statements. While it is true
>>that we can improve on the disk/net heartbeat, it is wrong to say that it
>>does not work or makes the cluster unstable.
>>
>>We have OCFS2 running on lots of clusters in Oracle that are testing each
>>new revision of the database. While these machines are test boxes, they
>>    
>>
>are
>  
>
>>all running loads designed to break Oracle. I am rarely pinged about them
>>hitting an OCFS2 issue.
>>
>>We also have internal production databases as well as Oracle customers who
>>are using OCFS2 with much success.
>>
>>However, we do have room for improvement and we are working on it.
>>
>>For the list of ongoing projects, you can peruse the OCFS2 Development
>>Wiki at http://oss.oracle.com/osswiki/OCFS2.
>>
>>If you wish to contribute code, as this is an open source project, feel
>>    
>>
>free
>  
>
>>to ping me or the ocfs2-devel at oss.oracle.com mailing list.
>>
>>Thanks
>>Sunil Mushran
>>
>>    
>>
>>>Hi Sunial,
>>>
>>>What are your thoughts about this message on the mailing lists?
>>>
>>>Thanks!
>>>Sanjeet
>>>
>>>
>>>------------------------------------------------------------------------
>>>
>>>*From:* ocfs2-users-bounces at oss.oracle.com
>>>[mailto:ocfs2-users-bounces at oss.oracle.com] *On Behalf Of
>>>      
>>>
>*Alexei_Roudnev
>  
>
>>>*Sent:* Friday, September 29, 2006 11:50 PM
>>>*To:* Bill Wells; Sunil Mushran
>>>*Cc:* ocfs2-users at oss.oracle.com
>>>*Subject:* Re: [Ocfs2-users] Use of OCFS2 file systems.
>>>
>>>
>>>
>>>If you can avoid OCFSv2 on a RAC server, better do it. Any cluster
>>>(RAC and OCFS) have it's own instability elements (OCFSv2 have a poor
>>>heartbeat alghoritm and so tend to self-fence without real failure,
>>>and (in addition) is relatively new. It works fine enough to be used,
>>>when you really need file sharing (such as database files or backups
>>>or even archive logs), but the less you use it, the better. Oracle
>>>home files feels well without sharing.
>>>
>>>
>>>
>>>// I don't see problems with OCFSv2 on SLES9 SP3-updated, but I avoid
>>>to use it for mission critical file systems or heavy-duty file systems,
>>>
>>>// and I still have failure scenario, when RAC cluster could work but
>>>OCFS cause full-cluster failure
>>>
>>>// If you have network problem, SAN
>>>
>>>// system restart, disk io error, etc etc - you can end up with system
>>>panic or reboot, caused by OCFS -
>>>
>>>// so the less OCFS you have, the better is your system stability.
>>>
>>>      
>>>
>>_______________________________________________
>>Ocfs2-users mailing list
>>Ocfs2-users at oss.oracle.com
>>http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
>>    
>>
>
>  
>




More information about the Ocfs2-users mailing list