[Ocfs-users] Lock contention issue with ocfs

Sunil Mushran Sunil.Mushran at oracle.com
Wed Mar 10 14:49:58 CST 2004


I hope, that when you were reading the dirnode, etc. using debugocfs,
you were accessing the volume via the raw device. If you weren't, do so.
This is important because that's the only way to ensure directio. Else,
you will be reading potentially stale data from the buffer cache.

Coming to the issue at hand. ls does not take an EXCLUSIVE_LOCK.
And all EXCLUSIVE_LOCKS are released when the operation is over.
So am not sure what is happening. Using debugocfs correctly should help
us understand the problem.

Also, whenever you do your file operations (cat etc.) ensure those ops are
o_direct. Now I am not sure why this would cause a problem, but do not
do buffered operations. ocfs does not support shared mmap.

If you download the 1.0.10 tools, you will not need to manually map the
raw device. The tools do that automatically.

So, upgrade to 1.0.10 module and tools. See if you can reproduce the
problem.

Jeremy Schneider wrote:

>another note:
>
>after I delete the file I created that caused the
>OCFS_DLM_EXCLUSIVE_LOCK to be held, the lock doesn't seem to actually be
>released (according to debugocfs) until the other node attempts to read
>the DirNode.  (e.g. /bin/ls or something)
>
>Jeremy
>
>
>  
>
>>>>"Jeremy Schneider" <jer1887 at asugroup.com> 03/10/2004 4:55:56 PM
>>>>
>>>>        
>>>>
>I am still having this weird problem with nodes hanging while I'm
>running OCFS.  I'm using OCFS 1.0.9-12 and RHAS 2.1
>
>I've been working on tracking it down and here's what I've got so far:
>1. I create a file from node 0.  This succeeds; I can /bin/cat the
>file, append, edit, or whatever.
>2. From node 1, I do an operation that accesses the DirNode (e.g.
>/bin/ls)
>3. Node 0 immediately acquires a OCFS_DLM_EXCLUSIVE_LOCK on the
>DirNode
>itself (although I seem to still be able to *read* the DirNode from
>node
>1)
>4. I attempt to create a file from node 1...  node 1 hangs, waiting
>for
>the exclusive lock on the DirNode to be released.
>*** node 1 is now completely dysfunctional.  OCFS is hung.
>5. I delete the file I created in step 1 (from node 0)
>6. The OCFS_DLM_EXCLUSIVE_LOCK is released.
>7. node 1 resumes, and creates a file
>
>8. I access the DirNode from node 0
>9. Node 1 immediately acquires a OCFS_DLM_EXCLUSIVE_LOCK on the
>DirNode
>itself...  the whole process repeats, but with the nodes reversed.
>
>This looks a lot like a bug to me.  I've had a case open with Oracle
>Support for it since the end of Feb, but at the moment BDE is too busy
>investigating some message about the local hard drive controller to
>consider that it might be a bug (and honestly, it probably doesn't
>involve my local hard drive controller).
>
>Anyone have any suggestions?
>
>Jeremy
>Lansing, MI
>
> 
><<<<...>>>>
>_______________________________________________
>Ocfs-users mailing list
>Ocfs-users at oss.oracle.com
>http://oss.oracle.com/mailman/listinfo/ocfs-users
>  
>




More information about the Ocfs-users mailing list