[Ocfs2-devel] [PATCH] ocfs2/dlm: don't handle migrate lockres if already in shutdown

Changwei Ge ge.changwei at h3c.com
Thu Mar 1 17:49:12 PST 2018


Hi Jun,
I still have some doubts about your problematic situation description .
Please check my reply inline your sequence diagram.

On 2018/3/1 20:38, piaojun wrote:
> Hi Changwei,
> 
> Thanks for your quick reply, please see my comments below.
> 
> On 2018/3/1 17:39, Changwei Ge wrote:
>> Hi Jun,
>>
>> On 2018/3/1 17:27, piaojun wrote:
>>> We should not handle migrate lockres if we are already in
>>> 'DLM_CTXT_IN_SHUTDOWN', as that will cause lockres remains after
>>> leaving dlm domain. At last other nodes will get stuck into infinite
>>> loop when requsting lock from us.
>>>
>>>       N1                             N2 (owner)
>>>                                      touch file
>>>
>>> access the file,
>>> and get pr lock
>>>
>>> umount
>>>
>>
>> Before migrating all lock resources, N1 should have already sent
>> DLM_BEGIN_EXIT_DOMAIN_MSG in dlm_begin_exit_domain().
>> N2 will set ->exit_domain_map later.
>> So N2 can't take N1 as migration target.
> Before receiveing N1's DLM_BEGIN_EXIT_DOMAIN_MSG, N2 has picked up N1 as
> the migrate target. So N2 will continue sending lockres to N1 even though
> N1 has left domain. Sorry for making you misunderstanding, I will give a
> more detailed description.
> 
>      N1                             N2 (owner)
>                                     touch file
> 
> access the file,
> and get pr lock
> 
>                                     begin leave domain and
>                                     pick up N1 as new owner
> 
> begin leave domain and
Here will clear N1 from N2's dlm->domain_map

> migrate all lockres do >
>                                     begin migrate lockres to N1
As N1 has been cleared, migrating for this lock resource can't continue.

Besides,
I wonder what kind of problem you encountered?
A bug crash? A hang situation?
It's better to share some issue appearance.So perhaps I can help to analyze.:)

Thanks,
Changwei

> 
> end leave domain, but
> the lockres left
> unexpectedly, because
> migrate task has passed
> 
> thanks,
> Jun
>>
>> How can the scenario your changelog describing happen?
>> Or if miss something?
>>
>> Thanks,
>> Changwei
>>
>>> migrate all lockres
>>>
>>>                                      umount and migrate lockres to N1
>>>
>>> leave dlm domain, but
>>> the lockres left
>>> unexpectedly, because
>>> migrate task has passed
>>>
>>> Signed-off-by: Jun Piao <piaojun at huawei.com>
>>> Reviewed-by: Yiwen Jiang <jiangyiwen at huawei.com>
>>> ---
>>>    fs/ocfs2/dlm/dlmdomain.c   | 14 ++++++++++++++
>>>    fs/ocfs2/dlm/dlmdomain.h   |  1 +
>>>    fs/ocfs2/dlm/dlmrecovery.c |  9 +++++++++
>>>    3 files changed, 24 insertions(+)
>>>
>>> diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c
>>> index e1fea14..3b7ec51 100644
>>> --- a/fs/ocfs2/dlm/dlmdomain.c
>>> +++ b/fs/ocfs2/dlm/dlmdomain.c
>>> @@ -675,6 +675,20 @@ static void dlm_leave_domain(struct dlm_ctxt *dlm)
>>>    	spin_unlock(&dlm->spinlock);
>>>    }
>>>
>>> +int dlm_joined(struct dlm_ctxt *dlm)
>>> +{
>>> +	int ret = 0;
>>> +
>>> +	spin_lock(&dlm_domain_lock);
>>> +
>>> +	if (dlm->dlm_state == DLM_CTXT_JOINED)
>>> +		ret = 1;
>>> +
>>> +	spin_unlock(&dlm_domain_lock);
>>> +
>>> +	return ret;
>>> +}
>>> +
>>>    int dlm_shutting_down(struct dlm_ctxt *dlm)
>>>    {
>>>    	int ret = 0;
>>> diff --git a/fs/ocfs2/dlm/dlmdomain.h b/fs/ocfs2/dlm/dlmdomain.h
>>> index fd6122a..2f7f60b 100644
>>> --- a/fs/ocfs2/dlm/dlmdomain.h
>>> +++ b/fs/ocfs2/dlm/dlmdomain.h
>>> @@ -28,6 +28,7 @@
>>>    extern spinlock_t dlm_domain_lock;
>>>    extern struct list_head dlm_domains;
>>>
>>> +int dlm_joined(struct dlm_ctxt *dlm);
>>>    int dlm_shutting_down(struct dlm_ctxt *dlm);
>>>    void dlm_fire_domain_eviction_callbacks(struct dlm_ctxt *dlm,
>>>    					int node_num);
>>> diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
>>> index ec8f758..9b3bc66 100644
>>> --- a/fs/ocfs2/dlm/dlmrecovery.c
>>> +++ b/fs/ocfs2/dlm/dlmrecovery.c
>>> @@ -1378,6 +1378,15 @@ int dlm_mig_lockres_handler(struct o2net_msg *msg, u32 len, void *data,
>>>    	if (!dlm_grab(dlm))
>>>    		return -EINVAL;
>>>
>>> +	if (!dlm_joined(dlm)) {
>>> +		mlog(ML_ERROR, "Domain %s not joined! "
>>> +				"lockres %.*s, master %u\n",
>>> +				dlm->name, mres->lockname_len,
>>> +				mres->lockname, mres->master);
>>> +		dlm_put(dlm);
>>> +		return -EINVAL;
>>> +	}
>>> +
>>>    	BUG_ON(!(mres->flags & (DLM_MRES_RECOVERY|DLM_MRES_MIGRATION)));
>>>
>>>    	real_master = mres->master;
>>>
>> .
>>
> 



More information about the Ocfs2-devel mailing list