[Ocfs2-users] Issues with iSCSI, Hosts Crashing

Nick Couchman Nick.Couchman at seakr.com
Thu Nov 8 06:53:55 PST 2007


I meant literally one or two seconds.  The iSCSI daemon isn't down for very long, and the iSCSI client retries every second (unit of time) to reconnect to the target daemon, succeeding after one or two seconds. 

Is the disk heartbeat threshold the "O2CB_HEARTBEAT_THRESHOLD" in the /etc/sysconfig/o2cb file (SuSE Linux)?  If so, the default is 7 (which, I would think would be longer than 1 second) and I've bumped it up to 30.  Is there another number I should be changing somewhere? 

Also, I don't mean to be a nuisance about it, but when you say you're working on extended attribute support, could you give a rough estimate as to how far out you expect that implementation to be? 

Thanks! 
--Nick

>>> On Wed, Nov 7, 2007 at  6:12 PM, Sunil Mushran <Sunil.Mushran at oracle.com> wrote:

Is it a second, as in, "just a sec"? :)

This could be the result of using the default cluster timeouts
which were fairly low. Refer the Cluster timeout section in the
FAQ for details. You should be able to bump up the disk heartbeat
threshold as recommended. For network timeouts, you will need
to upgrade to 2.6 20, if not 2.6.21.

We are currently working on adding support for extended attributes.

Sunil

Nick Couchman wrote:
> Two questions (and then a bonus one), kind of interrelated, but,
> first, some basic info.  I'm using OCFS2 on OpenSUSE 10.2, kernel
> 2.6.18.8(-0.7).  There are three nodes in the OCFS2 cluster, backed by
> Openfiler iSCSI storage.
> 
> First, I'm using Openfiler and iSCSI volumes to back my OCFS2 file
> system.  The nodes that are part of the OCFS2 cluster use the file
> system as a shared storage area for VMware Virtual Machines.  I'm
> experiencing a problem, though, with the Openfiler setup.  When you
> create a new iSCSI volume in Openfiler, it restarts the iSCSI
> Enterprise Target daemon (ietd).  This causes the OCFS2 nodes to see a
> problem on the iSCSI storage.  The problem usually lasts around a
> second - only as long as it takes the daemon to restart and the nodes
> to attempt the next connection.  OCFS2 seems to have a big problem
> with this and causes the machines to crash.
> 
> My second question is related to the end of the first one.  Is there
> any way to keep the node from completely dying if there is a problem
> with the storage, FS, etc.?  I don't have any fancy remote power
> setups or remote management cards, so when I create new volumes in
> Openfiler I have to be physically near the servers so that I can hard
> reset them should OCFS2 crash and bring the entire kernel down with
> it.  I realize there are issues with maintaining the integrity of the
> data, but maybe someone can point me in the direction of some way to
> keep the nodes alive longer before they just completely kernel panic
> and die.
> 
> Finally, unrelated - could someone update me on the status of POSIX
> ACLs in OCFS2?  This is the only thing that is keeping me from using
> OCFS2 on a wide basis in my organization, but it is a major
> roadblock.  Also, it seems that it has been on the roadmap for quite
> some time, and I was wondering if any progress has been made toward
> implementing ACLs on OCFS2.
> 
> Thanks, in advance, for any help/insight anyone can provide!
> --Nick
> ------------------------------------------------------------------------
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20071108/9794c169/attachment-0001.html


More information about the Ocfs2-users mailing list