[Ocfs2-devel] [PATCH] ocfs2/dlm: fix race between convert and recovery
Junxiao Bi
junxiao.bi at oracle.com
Fri Sep 18 00:47:57 PDT 2015
On 09/18/2015 03:25 PM, Joseph Qi wrote:
> On 2015/9/18 10:41, Junxiao Bi wrote:
>> Hi Joseph,
>>
>> On 09/17/2015 09:17 PM, Joseph Qi wrote:
>>>> There is a race window between dlmconvert_remote and
>>>> dlm_move_lockres_to_recovery_list, which will cause a lock with
>>>> OCFS2_LOCK_BUSY in grant list, thus system hangs.
>>>>
>>>> dlmconvert_remote
>>>> {
>>>> spin_lock(&res->spinlock);
>>>> list_move_tail(&lock->list, &res->converting);
>>>> lock->convert_pending = 1;
>>>> spin_unlock(&res->spinlock);
>>>>
>>>> status = dlm_send_remote_convert_request();
>>>> >>>>>> race window, master has queued ast and return DLM_NORMAL,
>>>> and then down before sending ast.
>>>> this node detects master down and call
>>>> dlm_move_lockres_to_recovery_list, which will revert the
>>>> lock to grant list.
>>>> Then OCFS2_LOCK_BUSY won't be cleared as new master won't
>>>> send ast any more because it thinks already be authorized.
>>>>
>>>> spin_lock(&res->spinlock);
>>>> lock->convert_pending = 0;
>>>> if (status != DLM_NORMAL)
>>>> dlm_revert_pending_convert(res, lock);
>>>> spin_unlock(&res->spinlock);
>>>> }
>>>>
>>>> In this case, just leave it in convert list and new master will take
>>>> care of it after recovery. And if convert request returns other than
>>>> DLM_NORMAL, convert thread will do the revert itself.
>>>> So remove the revert logic in dlm_move_lockres_to_recovery_list.
>> Yes, looks good. The lock was already in convert list. Recovery process
>> will shuffle the list and send ast again. So why not clean up
>> convert_pending, it is useless now?
> You are right. convert_pending is now useless. I will send a new version
> later.
> One more concern is, does it have relations with LVB?
I can't see how this affect LVB. LVB take affect after convert is done.
But convert is still on going here.
Thanks,
Junxiao.
>
>> The same thing happen for lock_pending, the lock was already in block
>> list. I think it can also be removed.
> I'll investigate on it.
>
>>
>> Thanks,
>> Junxiao.
>>
>
>
More information about the Ocfs2-devel
mailing list