[Ocfs-users] Lock contention issue with ocfs

Jeremy Schneider jer1887 at asugroup.com
Thu Mar 11 10:12:08 CST 2004


Just noticed/thought of something new.  /bin/rm acquires and releases an
EXCLUSIVE_LOCK on the DirNode.  If the node already *has* the
EXCLUSIVE_LOCK, then of course ocfs just continues.  Then, when the
remove operation is complete, the EXCLUSIVE_LOCK is released.  ...  [5
minutes later] ...  In fact, I just confirmed that removing *any* file
in the directory releases the EXCLUSIVE_LOCK.  ;)

Maybe the lock just somehow isn't getting released by
ocfs_create_file() in Common/ocfsgencreate.c (don't ask me why
though)...  indeed, any other operation (e.g. removing a file) that
acquires and releases the lock successfully would "fix" the hung
server...  like I said, I'm no ocfs guru, but that's my hunch...

Oh, the wonders of open source...  (!)

Jeremy


>>> "Jeremy Schneider" <jer1887 at asugroup.com> 03/11/2004 9:50:58 AM
>>>
Ahh... you were exactly right about raw devices, Sunil.  debugocfs was
reading the DirNode from the buffered device.  I mapped a raw device
and
tried again with much clearer results - the reason running 'ls' made
the
lock show up is that when I did an 'ls', ocfs forced the buffered
device
to refresh it's  buffer - and that's when the EXCLUSIVE_LOCK obtained
by
the opposite node showed up to debugocfs.

It seems that somehow the EXCLUSIVE_LOCK is not being released after
ocfs creates a file.  It doesn't matter whether or not I access the
file
in o_direct mode.  If I use dd with the o_direct flag to create the
file
the EXCLUSIVE_LOCK on the DirNode is still not released until I remove
the file.  I bet you could reproduce this on a lab machine (it seems
like the kind of problem that should be reproducible) - if anyone's
interested, email me and I can give you specific steps to reproduce.

I have not tried with the 1.0.10 tools yet; I'll do that.  (It doesn't
sound quite right, but I'm not exactly an ocfs guru <g> and it's the
best suggestion I've received so far...)  If anyone on the list has
any
other suggestions, please let me know.

Also, if anyone over at Oracle would like to do me a favor and somehow
straighten out the mess with the support case I have open (let them
know
it's not a hardware issue and get it out of BDE) I'd appreciate that
too...  just email me for details...

Thanks again,
Jeremy


>>> Sunil Mushran <Sunil.Mushran at oracle.com> 03/10/2004 5:49:58 PM >>>
> I hope, that when you were reading the dirnode, etc. using
debugocfs,
> you were accessing the volume via the raw device. If you weren't, do
so.
> This is important because that's the only way to ensure directio.
Else,
> you will be reading potentially stale data from the buffer cache.
>
> Coming to the issue at hand. ls does not take an EXCLUSIVE_LOCK.
> And all EXCLUSIVE_LOCKS are released when the operation is over.
> So am not sure what is happening. Using debugocfs correctly should
help
> us understand the problem.
>
> Also, whenever you do your file operations (cat etc.) ensure those
ops are
> o_direct. Now I am not sure why this would cause a problem, but do
not
> do buffered operations. ocfs does not support shared mmap.
> 
> If you download the 1.0.10 tools, you will not need to manually map
the
> raw device. The tools do that automatically.
> 
> So, upgrade to 1.0.10 module and tools. See if you can reproduce the
> problem.
 
<<<<...>>>>


More information about the Ocfs-users mailing list