[Ocfs2-devel] [PATCH] ocfs2: fix __ocfs2_cluster_lock() dead lock

Mon Jan 11 19:58:35 PST 2010

Hi Joel,

On 10-01-11 17:59, Joel Becker wrote:
> [Cc'd Mark and Dave]
> 
> On Sun, Jan 10, 2010 at 02:05:22AM +0800, Wengang Wang wrote:
> > > 2) It sure looks like you're right.
> > well, this doesn't sounds like a sob :)
> 
> 	With good reason, too.  You've got a bug.  Dave Teigland (copied
> now) found it while testing with fs/dlm.
> 	The short description is that you didn't notice the handling of
> async -EAGAIN.  Not really your fault, there are no obvious comments.
> With o2dlm, you get -EAGAIN from the call to ocfs2_dlm_lock().  That's
> safe with your patch.  However, with fs/dlm, EAGAIN comes
> asynchronously.  ocfs2_dlm_lock() will return 0.  When the ast fires,
> the ast notices the -EAGAIN and clears the locking state for a retry.
> In your patch, however, you take the 'lock_done' + !BUSY and immediately
> increment the holders - even though the lock hasn't been acquired!

ah, yes. I fault.
I should check more deeper.

> 	The fix is pretty simple, though - when lock_done is set, you
> know we've at least tried to upconvert.  If the locking level is good,
> we know the upconvert succeeded.  We can then go to inc_holders.  I've
> attached a new version of your patch that does this.
> 	I've also modified the patch to reflect that the original
> problem you found is actually a livelock, not a deadlock.  And I added a
> little comment about the async -EAGAIN.
> 	I've attached the full patch with my changes.  Dave, please test
> my version (the attached one) instead of Wengang's.  Mark, can you do a
> sanity check here?
> 	Here's the diff between your patch and mine:

snip...

> Date: Wed, 6 Jan 2010 16:34:44 +0800
> Subject: [PATCH] ocfs2: fix __ocfs2_cluster_lock() livelock
> 
> There is livelock possibility in __ocfs2_cluster_lock().  Here's what
> happens:
> 
> 1) node A(the lock owner) is doing an up-convert(UC, PR->EX). it gets
> the lock but doesn't check BUSY again because the level is right.
> 
> 2) node B requested an UC on the same lock resource. since node A has an
> EX on the lockres, a bast is issued and OCFS2_LOCK_BLOCKED flag is set
> to the lockres meanting that a DC is should be done. the DC asks for
> agreement of node A to release EX.  The downconvert starts because BUSY
> is cleared by the ast.
> 
> 3) the UC on node A runs into the check of OCFS2_LOCK_BLOCKED:
> 
>         if (lockres->l_flags & OCFS2_LOCK_BLOCKED &&
>             !ocfs2_may_continue_on_blocked_lock(lockres, level)) {
> 
> analysis:
> the BLOCKED flag is set in 2), and the UC can't continue to get the lock
> (ocfs2_may_continue_on_blocked_lock() returns false), so it waits on the
> DC to finish.
> 
> 4) The DC finishes, and BLOCKED is cleared.  The UC on node A starts over
>    getting the UC, now from NL->EX.  It asks node B for the lock.

I am wondering why DC can finish.
although BUSY is cleared. I think the check_downconvert() should return
false since we have not done whatever we want to do. and then the DC should
be requeued.

#don't hate me if I am wrong/stupid :)

regards,
wengang.