[Ocfs2-devel] ocfs2: A race between refmap setting and clearing

Junxiao Bi junxiao.bi at oracle.com
Mon Jan 18 19:03:19 PST 2016


Hi Jiufei & Joseph,

On 01/18/2016 03:07 PM, xuejiufei wrote:
> 
> 
> On 2016/1/18 12:28, Junxiao Bi wrote:
>> On 01/13/2016 04:24 PM, Joseph Qi wrote:
>>> Hi Junxiao,
>>>
>>> On 2016/1/13 15:00, Junxiao Bi wrote:
>>>> On 01/13/2016 02:21 PM, xuejiufei wrote:
>>>>> Hi Junxiao,
>>>>> I have not describe the issue clearly.
>>>>>
>>>>> Node 1                               Node 2(master)
>>>>> dlmlock
>>>>> dlm_do_master_request
>>>>>                                 dlm_master_request_handler
>>>>>                                 -> dlm_lockres_set_refmap_bit
>>>>> dlmlock succeed
>>>>> dlmunlock succeed
>>>>>
>>>>> dlm_purge_lockres
>>>>>                                 dlm_deref_handler
>>>>>                                 -> find lock resource is in
>>>>>                                    DLM_LOCK_RES_SETREF_INPROG state,
>>>>>                                    so dispatch a deref work
>>>>> dlm_purge_lockres succeed.
>>>>>
>>>>> call dlmlock again
>>>>> dlm_do_master_request
>>>>>                                 dlm_master_request_handler
>>>>>                                 -> dlm_lockres_set_refmap_bit
>>>>>
>>>>>                                 deref work trigger, call
>>>>>                                 dlm_lockres_clear_refmap_bit
>>>>>                                 to clear Node 1 from refmap
>>>>>
>>>>>                                 dlm_purge_lockres succeed
>>>>>
>>>>> dlm_send_remote_lock_request
>>>>>                                 return DLM_IVLOCKID because
>>>>>                                 the lockres is not exist
>>>> More clear now. Thank you.
>>>> This is a very complicated race. I didn't have a good solution to fix it
>>>> now. Your fix looks work, but I am afraid if we keep going fix this
>>>> kinds of races case by case, we will make dlm harder to understand and
>>>> easy to involve bugs, maybe we should think about refactor dlm.
>>>>
>>> Agree. IMO, the root cause is bit op cannot handle such a case.
>>> I wonder if we have to change it to refcount, which may require a much
>>> bigger refactoring.
>> one bit for each node seems reasonable, as lockres is per node. I think
>> the cause is the dis-order of set/clear, i am trying to see whether they
>> can be made happen in order.
>>
> Agree. The solution 1) in my first mail is going to add a new message to
> keep the order of set and clear. Other nodes can purge the lock resource
> only after the refmap on master is cleared.
I am taking another way, try to delete deref_lockres_worker, and make
deref refmap happen directly in deref handler. This needs find the
dis-order part and fix. As I can see, the dis-order can only happen at
do_assert_master set refmap after deref_lockres_handler clear it?

For this, need make sure not return MASTERY_REF to owner after purge.
SETREF_INPROG is designed to do this. It is set by assert_master_handler
and purge will wait it cleared before deref lockres. There are two cases
where SETREF_INPROG is not set when purge

1. assert master message not coming when purge()
This happened when lockres owner already exist when asking for a
lockres. Lockres owner will be known from master request message and
dlm_get_lock_resouces return. Soon dlmlock/dlmunlock is done, and purge
done. Then assert master message coming, but since MLE have been deleted
in dlm_get_lock_resouce, it will not return MASTERY_REF to owner to
reset refmap. So no problem in this case.

2. assert master handler set SETREF_INPROG too late
SETREF_INPROG is set after lockres owner has been updated and
dlm->spinlock released. That will make dlm_get_lock_resource waken up
too fast, and then purge may happen before SETREF_INPROG is set. This is
a bug and can be fix by the following patch.

diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
index 9477d6e1de37..83cd65b128d0 100644
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -1965,6 +1965,7 @@ ok:
                if (res) {
                        int wake = 0;
                        spin_lock(&res->spinlock);
+                       res->state |= DLM_LOCK_RES_SETREF_INPROG;
                        if (mle->type == DLM_MLE_MIGRATION) {
                                mlog(0, "finishing off migration of
lockres %.*s, "
                                        "from %u to %u\n",
@@ -2029,12 +2030,8 @@ ok:

 done:
        ret = 0;
-       if (res) {
-               spin_lock(&res->spinlock);
-               res->state |= DLM_LOCK_RES_SETREF_INPROG;
-               spin_unlock(&res->spinlock);
+       if (res)
                *ret_data = (void *)res;
-       }
        dlm_put(dlm);
        if (master_request) {
                mlog(0, "need to tell master to reassert\n");


Besides this path, revert commit
f3f854648de64c4b6f13f6f13113bc9525c621e5 ("ocfs2_dlm: Ensure correct
ordering of set/clear refmap bit on lockres"), can this fix this issue?

Thanks,
Junxiao.



> 
> Thanks
> Jiufei
> 
>> Thanks,
>> Junxiao.
>>>
>>> Thanks,
>>> Joseph
>>>
>>>> Thanks,
>>>> Junxiao.
>>>>
>>>>> BUG if the lockres is $RECOVERY
>>>>>
>>>>> On 2016/1/13 10:46, Junxiao Bi wrote:
>>>>>> On 01/12/2016 03:16 PM, xuejiufei wrote:
>>>>>>> Hi, Junxiao
>>>>>>>
>>>>>>> On 2016/1/12 12:03, Junxiao Bi wrote:
>>>>>>>> Hi Jiufei,
>>>>>>>>
>>>>>>>> On 01/11/2016 10:46 AM, xuejiufei wrote:
>>>>>>>>> Hi all,
>>>>>>>>> We have found a race between refmap setting and clearing which
>>>>>>>>> will cause the lock resource on master is freed before other nodes
>>>>>>>>> purge it.
>>>>>>>>>
>>>>>>>>> Node 1                               Node 2(master)
>>>>>>>>> dlm_do_master_request
>>>>>>>>>                                 dlm_master_request_handler
>>>>>>>>>                                 -> dlm_lockres_set_refmap_bit
>>>>>>>>> call dlm_purge_lockres after unlock
>>>>>>>>>                                 dlm_deref_handler
>>>>>>>>>                                 -> find lock resource is in
>>>>>>>>>                                    DLM_LOCK_RES_SETREF_INPROG state,
>>>>>>>>>                                    so dispatch a deref work
>>>>>>>>> dlm_purge_lockres succeed.
>>>>>>>>>
>>>>>>>>> dlm_do_master_request
>>>>>>>>>                                 dlm_master_request_handler
>>>>>>>>>                                 -> dlm_lockres_set_refmap_bit
>>>>>>>>>
>>>>>>>>>                                 deref work trigger, call
>>>>>>>>>                                 dlm_lockres_clear_refmap_bit
>>>>>>>>>                                 to clear Node 1 from refmap
>>>>>>>>>
>>>>>>>>> Now Node 2 can purge the lock resource but the owner of lock resource
>>>>>>>>> on Node 1 is still Node 2 which may trigger BUG if the lock resource
>>>>>>>>> is $RECOVERY or other problems.
>>>>>>>>>
>>>>>>>>> We have discussed 2 solutions:
>>>>>>>>> 1)The master return error to Node 1 if the DLM_LOCK_RES_SETREF_INPROG
>>>>>>>>> is set. Node 1 will not retry and master send another message to Node 1
>>>>>>>>> after clearing the refmap. Node 1 can purge the lock resource after the
>>>>>>>>> refmap on master is cleared.
>>>>>>>>> 2) The master return error to Node 1 if the DLM_LOCK_RES_SETREF_INPROG
>>>>>>>>> is set, and Node 1 will retry to deref the lockres.
>>>>>>>>>
>>>>>>>>> Does anybody has better ideas?
>>>>>>>>>
>>>>>>>> dlm_purge_lockres() will wait to drop ref until
>>>>>>>> DLM_LOCK_RES_SETREF_INPROG cleared. So if set this flag when find the
>>>>>>>> master during doing master request. And then this flag was cleared when
>>>>>>>> receiving assert master message, can this fix the issue?
>>>>>>>>
>>>>>>> I don't think this can fix. Before doing master request, the lock resource is
>>>>>>> already purged. The master should clear the refmap before client purge it.
>>>>>> inflight_locks is increased in dlm_get_lock_resource() which will stop
>>>>>> lockres purged? Set DLM_LOCK_RES_SETREF_INPROG when found lockres owner
>>>>>> during master request, then this will stop lockres purged after unlock?
>>>>>>
>>>>>> Thanks,
>>>>>> Junxiao.
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Jiufei
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Junxiao.
>>>>>>>>> Thanks,
>>>>>>>>> --Jiufei
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> .
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> .
>>>>>>
>>>>>
>>>>
>>>>
>>>> .
>>>>
>>>
>>>
>>
>>
>> .
>>
> 




More information about the Ocfs2-devel mailing list