[Ocfs2-devel] [PATCH] ocfs2: wait for recovering done after direct unlock request
piaojun
piaojun at huawei.com
Tue Feb 19 00:35:54 PST 2019
Hi Changwei,
On 2019/2/19 10:27, Changwei Ge wrote:
> Hi Jun,
>
> On 2019/2/19 9:16, piaojun wrote:
>> Hi Changwei,
>>
>> Thanks for your patient explaination, and I get your point. But I have
>> some suggestion about your patch below.
>
> Thanks your advise to this patch.
>
>>
>> On 2019/2/18 17:46, Changwei Ge wrote:
>>> Hi Jun,
>>>
>>> Ping...
>>>
>>> Do you have any further question?
>>> If any question uncleared, please let me know.
>>> Otherwise, could you please give a feedback?
>>>
>>> Thanks,
>>> Changwei
>>>
>>> On 2019/2/15 16:48, Changwei Ge wrote:
>>>> Hi Jun,
>>>>
>>>> On 2019/2/15 16:36, piaojun wrote:
>>>>> Hi Changwei,
>>>>>
>>>>> Thanks for your explaination, and I still have a question below.
>>>>
>>>> Thank you for looking into this.
>>>>
>>>>>
>>>>> On 2019/2/15 14:29, Changwei Ge wrote:
>>>>>> Hi Jun,
>>>>>>
>>>>>> On 2019/2/15 14:20, piaojun wrote:
>>>>>>> Hi Changwei,
>>>>>>>
>>>>>>> The DLM process is a little bit complex, so I suggest pasting the code
>>>>>>> path. And I wonder if my code is right?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Jun
>>>>>>>
>>>>>>> On 2019/2/14 14:14, Changwei Ge wrote:
>>>>>>>> There is scenario causing ocfs2 umount hang when multiple hosts are
>>>>>>>> rebooting at the same time.
>>>>>>>>
>>>>>>>> NODE1 NODE2 NODE3
>>>>>>>> send unlock requset to NODE2
>>>>>>>> dies
>>>>>>>> become recovery master
>>>>>>>> recover NODE2
>>>>>>>> find NODE2 dead
>>>>>>>> mark resource RECOVERING
>>>>>>>> directly remove lock from grant list
>>>>>>> dlm_do_local_recovery_cleanup
>>>>>>> dlm_move_lockres_to_recovery_list
>>>>>>> res->state |= DLM_LOCK_RES_RECOVERING;
>>>>>>> list_add_tail(&res->recovering, &dlm->reco.resources);
>>>>>>>
>>>>>>>> calculate usage but RECOVERING marked
>>>>>>>> **miss the window of purging
>>>>>>> dlmunlock
>>>>>>> dlmunlock_remote
>>>>>>> dlmunlock_common // unlock successfully directly
>>>>>>>
>>>>>>> dlm_lockres_calc_usage
>>>>>>> __dlm_lockres_calc_usage
>>>>>>> __dlm_lockres_unused
>>>>>>> if (res->state & (DLM_LOCK_RES_RECOVERING| // won't purge lock as DLM_LOCK_RES_RECOVERING is set
>>>>>>
>>>>>> True.
>>>>>>
>>>>>>>
>>>>>>>> clear RECOVERING
>>>>>>> dlm_finish_local_lockres_recovery
>>>>>>> res->state &= ~DLM_LOCK_RES_RECOVERING;
>>>>>>>
>>>>>>> Could you help explaining where getting stuck?
>>>>>>
>>>>>> Sure,
>>>>>> As dlm missed the window to purge lock resource, it can't be unhashed.
>>>>>>
>>>>>> During umount:
>>>>>> dlm_unregister_domain
>>>>>> dlm_migrate_all_locks -> there is always a lock resource hashed, so can't return from dlm_migrate_all_locks() thus hang during umount.
>>>>>
>>>>> In dlm_migrate_all_locks, lockres will be move to purge_list and purged again:
>>>>> dlm_migrate_all_locks
>>>>> __dlm_lockres_calc_usage
>>>>> list_add_tail(&res->purge, &dlm->purge_list);
>>>>>
>>>>> Do you mean this process does not work?
>>>>
>>>> Yes, only for Migrating lock resources, it has a chance to call __dlm_lockres_calc_usage()
>>>> But in our situation, the problematic lock resource is obviously no master. So no chance for it to set stack variable *dropped* in dlm_migrate_all_locks()
>>>>
>>>> Thanks,
>>>> Changwei
>>>>
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Changwei
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> To reproduce this iusse, crash a host and then umount ocfs2
>>>>>>>> from another node.
>>>>>>>>
>>>>>>>> To sovle this, just let unlock progress wait for recovery done.
>>>>>>>>
>>>>>>>> Signed-off-by: Changwei Ge <ge.changwei at h3c.com>
>>>>>>>> ---
>>>>>>>> fs/ocfs2/dlm/dlmunlock.c | 23 +++++++++++++++++++----
>>>>>>>> 1 file changed, 19 insertions(+), 4 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/fs/ocfs2/dlm/dlmunlock.c b/fs/ocfs2/dlm/dlmunlock.c
>>>>>>>> index 63d701c..c8e9b70 100644
>>>>>>>> --- a/fs/ocfs2/dlm/dlmunlock.c
>>>>>>>> +++ b/fs/ocfs2/dlm/dlmunlock.c
>>>>>>>> @@ -105,7 +105,8 @@ static enum dlm_status dlmunlock_common(struct dlm_ctxt *dlm,
>>>>>>>> enum dlm_status status;
>>>>>>>> int actions = 0;
>>>>>>>> int in_use;
>>>>>>>> - u8 owner;
>>>>>>>> + u8 owner;
>>>>>>>> + int recovery_wait = 0;
>>>>>>>>
>>>>>>>> mlog(0, "master_node = %d, valblk = %d\n", master_node,
>>>>>>>> flags & LKM_VALBLK);
>>>>>>>> @@ -208,9 +209,12 @@ static enum dlm_status dlmunlock_common(struct dlm_ctxt *dlm,
>>>>>>>> }
>>>>>>>> if (flags & LKM_CANCEL)
>>>>>>>> lock->cancel_pending = 0;
>>>>>>>> - else
>>>>>>>> - lock->unlock_pending = 0;
>>>>>>>> -
>>>>>>>> + else {
>>>>>>>> + if (!lock->unlock_pending)
>>>>>>>> + recovery_wait = 1;
>>
>> Could we just check if res->state is in DLM_LOCK_RES_RECOVERING or
>> status is DLM_RECOVERING? And there is no need to define an extra
>> variable.
>
> As the lock resource master had died, DLM_RECOVERING can't be returned.
>
> I prefer adding a stack variable like *recovery_wait* to tell if involved lock resource has a chance to be recovered.
> In this way, we won't make code subtle comparing with normal code path. As you know, the case we discuss here is not
> that possible to happen when each node in cluster works well. And we can also save some CPU cycles this way.
I mean we could check if the res->state is in DLM_LOCK_RES_RECOVERING.
As when lock->unlock_pending is cleared in
dlm_move_lockres_to_recovery_list, res->state must be set
DLM_LOCK_RES_RECOVERING.
And I think of another solution which seems a little easier and save
some CPU work. We could call __dlm_lockres_calc_usage to put unused
lockres into purge list in dlm_finish_local_lockres_recovery. And this
will not stuck the unlocking process.
>
> Thanks,
> Changwei
>
>>
>>>>>>>> + else
>>>>>>>> + lock->unlock_pending = 0;
>>>>>>>> + }
>>>>>>>> }
>>>>>>>>
>>>>>>>> /* get an extra ref on lock. if we are just switching
>>>>>>>> @@ -244,6 +248,17 @@ static enum dlm_status dlmunlock_common(struct dlm_ctxt *dlm,
>>>>>>>> spin_unlock(&res->spinlock);
>>>>>>>> wake_up(&res->wq);
>>>>>>>>
>>>>>>>> + if (recovery_wait) {
>>>>>>>> + spin_lock(&res->spinlock);
>>>>>>>> + /* Unlock request will directly succeed after owner dies,
>>>>>>>> + * and the lock is already removed from grant list. We have to
>>>>>>>> + * wait for RECOVERING done or we miss the chance to purge it
>>>>>>>> + * since the removement is much faster than RECOVERING proc.
>>>>>>>> + */
>>>>>>>> + __dlm_wait_on_lockres_flags(res, DLM_LOCK_RES_RECOVERING);
>>>>>>>> + spin_unlock(&res->spinlock);
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> /* let the caller's final dlm_lock_put handle the actual kfree */
>>>>>>>> if (actions & DLM_UNLOCK_FREE_LOCK) {
>>>>>>>> /* this should always be coupled with list removal */
>>>>>>>>
>>>>>>>
>>>>>> .
>>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Ocfs2-devel mailing list
>>>> Ocfs2-devel at oss.oracle.com
>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>>
>>> .
>>>
>>
> .
>
More information about the Ocfs2-devel
mailing list