[Ocfs2-devel] [PATCH] ocfs2: fix __ocfs2_cluster_lock() dead lock

Tue Jan 12 06:30:50 PST 2010

On 10-01-12 05:01, Joel Becker wrote:
> > so maybe you don't notice the lockres->l_ops->check_downconvert()?
> 
> 	I notice it.  I know what it checks.  If the lock is not
> currently taken it will return "go ahead and downconvert".  So we
> won't requeue, we'll downconvert.  Specifically, look at
> ocfs2_check_meta_downconvert().  It merely checks the caching info.  We
> can't have any cached data if we don't have the EX yet.

Is it a rule that If the lock is not currently taken it will return "go
ahead and downconvert"?

by checking ocfs2_check_meta_downconvert(), I think it ensures cache is
checkpointed by JBD(2) before returning "go ahead and downconvert".
maybe yes that you noticed that we can't have cache if we don't have
EX, I think the main purpose is that it ensures that we finish things which we
should finish before we release the lock. another evidence is that it
doesn't check the ex_holders.

another question is that you can find out ocfs2_check_meta_downconvert()
checks if lock is taken(not explicitly), can you make such a conclusion
that all check_downconvert() follows it? --sorry, I didn't check them
and pushed it to you :P --I will check them too later.

so still the question, is it a rule that check_downconvert() should 
return "go ahead and downconvert" if the lock is not currently taken?
and also, what means "currently taken"? --ocfs2 layer(after
ocfs2_cluster_lock() returns) or dlm level? though I guess you meant
ocfs2 layer.

regards,
wengang.
> 	Sunil has observed the livelock behavior (pinging between UC and
> DC) in the wild via traces.  The UC process never leaves
> ocfs2_cluster_lock().  Your change fixes that.
>