[Ocfs2-devel] Race condition between OCFS2 downconvert thread and ocfs2 cluster lock.
Sunil Mushran
sunil.mushran at oracle.com
Tue Feb 21 09:48:59 PST 2012
> bast queued and flushed,before the ast was queued
Unlikely with o2dlm. dlmthread always sends ASTs before BASTs.
Can you recreate the entire lockres? A full dump may yield more
information.
Sunil
On 02/20/2012 10:12 PM, xiaowei.hu at oracle.com wrote:
> I am trying to fix bug13611997,CT's machine run into BUG in ocfs2dc thread, BUG_ON(lockres->l_action != OCFS2_AST_CONVERT&& lockres->l_action != OCFS2_AST_DOWNCONVERT); I analysized the vmcore , the lockres->l_action = OCFS2_AST_ATTACH and l_flags=326(which means OCFS2_LOCK_BUSY|OCFS2_LOCK_BLOCKED|OCFS2_LOCK_INITIALIZED|OCFS2_LOCK_QUEUED), after compared with the code , this status could be only possible during ocfs2_cluster_lock,here is the race situation:
>
> NodeA NodeB
> ocfs2_cluster_lock on a new lockres M
> spin_lock_irqsave(&lockres->l_lock, flags);
> gen = lockres_set_pending(lockres);
> lockres->l_action = OCFS2_AST_ATTACH;
> lockres_or_flags(lockres, OCFS2_LOCK_BUSY);
> spin_unlock_irqrestore(&lockres->l_lock, flags);
>
> ocfs2_dlm_lock() finished and returned.
> **and lockres_clear_pending(lockres, gen, osb);
> request a lock on the same lockres M
> It's blocked by nodeA, and a ast proxy was send to A
>
> bast queued and flushed,before the ast was queued
> then the ocfs2dc was scheduled
> there is a chance to execute this code path:
> ocfs2_downconvert_thread()
> ocfs2_downconvert_thread_do_work()
> ocfs2_blocking_ast()
> ocfs2_process_blocked_lock()
> ocfs2_unblock_lock()
> spin_lock_irqsave(&lockres->l_lock, flags);
> if (lockres->l_flags& OCFS2_LOCK_BUSY)
> ret = ocfs2_prepare_cancel_convert(osb, lockres);
> BUG_ON(lockres->l_action != OCFS2_AST_CONVERT&&
> lockres->l_action != OCFS2_AST_DOWNCONVERT);
> here trigger the BUG()
>
> Solution:
> One possible solution for this is to remove the lockres_clear_pending marked by 2 stars, and left this clear work to the ast function.In this way could make sure the bast function wait for ast , let it clear OCFS2_LOCK_BUSY and set OCFS2_LOCK_ATTACHED first, before enter downconvert process.
>
>
More information about the Ocfs2-devel
mailing list