[Ocfs2-devel] [PATCH] ocfs2/dlm: fix race between convert and recovery
Joseph Qi
joseph.qi at huawei.com
Fri Sep 18 00:25:15 PDT 2015
On 2015/9/18 10:41, Junxiao Bi wrote:
> Hi Joseph,
>
> On 09/17/2015 09:17 PM, Joseph Qi wrote:
>> > There is a race window between dlmconvert_remote and
>> > dlm_move_lockres_to_recovery_list, which will cause a lock with
>> > OCFS2_LOCK_BUSY in grant list, thus system hangs.
>> >
>> > dlmconvert_remote
>> > {
>> > spin_lock(&res->spinlock);
>> > list_move_tail(&lock->list, &res->converting);
>> > lock->convert_pending = 1;
>> > spin_unlock(&res->spinlock);
>> >
>> > status = dlm_send_remote_convert_request();
>> > >>>>>> race window, master has queued ast and return DLM_NORMAL,
>> > and then down before sending ast.
>> > this node detects master down and call
>> > dlm_move_lockres_to_recovery_list, which will revert the
>> > lock to grant list.
>> > Then OCFS2_LOCK_BUSY won't be cleared as new master won't
>> > send ast any more because it thinks already be authorized.
>> >
>> > spin_lock(&res->spinlock);
>> > lock->convert_pending = 0;
>> > if (status != DLM_NORMAL)
>> > dlm_revert_pending_convert(res, lock);
>> > spin_unlock(&res->spinlock);
>> > }
>> >
>> > In this case, just leave it in convert list and new master will take
>> > care of it after recovery. And if convert request returns other than
>> > DLM_NORMAL, convert thread will do the revert itself.
>> > So remove the revert logic in dlm_move_lockres_to_recovery_list.
> Yes, looks good. The lock was already in convert list. Recovery process
> will shuffle the list and send ast again. So why not clean up
> convert_pending, it is useless now?
You are right. convert_pending is now useless. I will send a new version
later.
One more concern is, does it have relations with LVB?
> The same thing happen for lock_pending, the lock was already in block
> list. I think it can also be removed.
I'll investigate on it.
>
> Thanks,
> Junxiao.
>
More information about the Ocfs2-devel
mailing list