[Ocfs2-devel] [PATCH] ocfs2/dlm: Clean DLM_LKSB_GET_LVB and DLM_LKSB_PUT_LVB when the cancel_pending is set

Sat Dec 8 00:14:18 PST 2018

On 2018/12/8 15:40, wangjian wrote:
> Hi Changwei,
> 
> What do you mean by setting the LVB bit to be 428 lines of memcpy?
> If so, you should realize that under the scenario and problem solving method I mentioned,
> the result of the last AST processing of the if statement in line 426 is false.
> This is indeed a more complicated anomaly scenario,
> and you may not fully understand the scenario I mentioned above.
> Below I will describe the key points of this scenario again.
> 
> The master of the lock resource has died. Because Node3 is in the cancel_convert process,
> Node3 moves the lock to the grant queue in the dlm_move_lockres_to_recovery_list
> function (the DLM_LKSB_GET_LVB and DLM_LKSB_PUT_LVB are not cleared).
> The lock information is sent to the new master (Node4) during the recovery process.
> Then the cancel_convert process clears DLM_LKSB_GET_LVB and DLM_LKSB_PUT_LVB.
> The lock_3 information on the new master (Node4) is inconsistent with
> the local lock_3 information on Node3, causing a BUG to occur when Node3 receives the AST.

Aha... I got your point and what's the problem like.

So as the old master died Node3 thinks that the conversion cancellation succeeds and thus clears _DLM_LKSB_PUT_LVB_ and _DLM_LKSB_GET_LVB_ in dlmunlock_common().
But before that dlm recovery procedure had already sent lksb of lock to the *new* master, right? Um, so we have a very fast dlm recovery progress. :-)
I admit its possibility.

So actually, we can justify if conversion cancellation succeeds or not like your another patch, right?
If we already know we had a field  conversion cancellation, why we still clear _DLM_LKSB_PUT_LVB_ and _DLM_LKSB_GET_LVB_ ?

And IMO in your case sending _DLM_LKSB_PUT_LVB_ and _DLM_LKSB_GET_LVB_ to *new* master makes sense as we still can ask the new master transfer LVB back to boost performace.

So I prefer to merge your two patches addressing the problems and I think they have the same root cause.

Thanks,
Changwei

> 
> Thanks,
> Jian
> 
> On 12/7/2018 11:40 AM, Changwei Ge wrote:
>> Hi Jian,
>>
>> Um...
>> I am a little puzzled after delving into this patch.
>>
>> Do you mean the BUG check below?
>> '''
>> 425                 /* if we requested the lvb, fetch it into our lksb now */
>> 426                 if (flags & LKM_GET_LVB) {
>> 427                         BUG_ON(!(lock->lksb->flags & DLM_LKSB_GET_LVB));
>> 428                         memcpy(lock->lksb->lvb, past->lvb, DLM_LVB_LEN);
>> 429                 }
>>
>> '''
>>
>> If so, you clear DLM_LKSB_GET_LVB in dlm_commit_pending_cancel() and how could the LVB bit be set in dlm_proxy_ast_handler()?
>>
>> Thanks,
>> Changwei
>>
>>
>> On 2018/12/6 19:54, wangjian wrote:
>>> Hi Changwei,
>>>
>>> The core information that causes the bug in the dlm_proxy_ast_handler function is as follows.
>>>
>>> [  699.795843] kernel BUG at /home/Euler_compile_env/usr/src/linux-4.18/fs/ocfs2/dlm/dlmast.c:427!
>>> [  699.797525] invalid opcode: 0000 [#1] SMP NOPTI
>>> [  699.798383] CPU: 8 PID: 510 Comm: kworker/u24:1 Kdump: loaded Tainted: G           OE     4.18.0 #1
>>> [  699.800002] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
>>> [  699.801963] Workqueue: o2net o2net_rx_until_empty [ocfs2_nodemanager]
>>> [  699.803275] RIP: 0010:dlm_proxy_ast_handler+0x738/0x740 [ocfs2_dlm]
>>> [  699.804710] Code: 00 10 48 8d 7c 24 48 48 89 44 24 48 48 c7 c1 f1 35 92 c0 ba 30 01 00 00 48 c7 c6 30 a9 91 c0 31 c0 e8
>>> ac 88 fb ff 0f 0b 0f 0b <0f> 0b 66 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 57 45 89 c7
>>> [  699.808506] RSP: 0018:ffffba64c6f2fd38 EFLAGS: 00010246
>>> [  699.809456] RAX: ffff9f34a9b39148 RBX: ffff9f30b7af4000 RCX: ffff9f34a9b39148
>>> [  699.810698] RDX: 000000000000019e RSI: ffffffffc091a930 RDI: ffffba64c6f2fd80
>>> [  699.811927] RBP: ffff9f2cb7aa3000 R08: ffff9f2cb7b99400 R09: 000000000000001f
>>> [  699.813457] R10: ffff9f34a9249200 R11: ffff9f34af23aa00 R12: 0000000040000000
>>> [  699.814719] R13: ffff9f34a9249210 R14: 0000000000000002 R15: ffff9f34af23aa28
>>> [  699.815984] FS:  0000000000000000(0000) GS:ffff9f32b7c00000(0000) knlGS:0000000000000000
>>> [  699.817417] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [  699.818825] CR2: 00007fd772f5a140 CR3: 000000005b00a001 CR4: 00000000001606e0
>>> [  699.820123] Call Trace:
>>> [  699.820658]  o2net_rx_until_empty+0x94b/0xcc0 [ocfs2_nodemanager]
>>> [  699.821848]  process_one_work+0x171/0x370
>>> [  699.822595]  worker_thread+0x49/0x3f0
>>> [  699.823301]  kthread+0xf8/0x130
>>> [  699.823972]  ? max_active_store+0x80/0x80
>>> [  699.824881]  ? kthread_bind+0x10/0x10
>>> [  699.825589]  ret_from_fork+0x35/0x40
>>>
>>> The reasons for clearing the LVB flag are as follows. First, The owner of the lock resource
>>> may have died, the lock has been moved to the grant queue, the purpose of the lock
>>> cancellation has been reached, and the LVB flag should be cleared. Second, solve this panic problem.
>>>
>>> Thanks,
>>> Jian
>>>
>>> On 12/5/2018 10:01 AM, Changwei Ge wrote:
>>>> Hi Jian,
>>>>
>>>> Could your please also share the panic backtrace in dlm_proxy_ast_handler()?
>>>> After that I can help to review this patch and analyze what's wrong in DLM.
>>>>
>>>> It will be better for you to tell your intention why LVB flags should be cleared rather
>>>> than just giving a longggg time sequence diagram.
>>>>
>>>> Moreover, your patches are hard be applied to my tree :(
>>>>
>>>> Thanks,
>>>> Changwei
>>>>
>>>>
>>>> On 2018/12/3 20:08, wangjian wrote:
>>>>> Function dlm_move_lockres_to_recovery_list should clean
>>>>> DLM_LKSB_GET_LVB and DLM_LKSB_PUT_LVB when the cancel_pending
>>>>> is set. Otherwise node may panic in dlm_proxy_ast_handler.
>>>>>
>>>>> Here is the situation: At the beginning, Node1 is the master
>>>>> of the lock resource and has NL lock, Node2 has PR lock,
>>>>> Node3 has PR lock, Node4 has NL lock.
>>>>>
>>>>> Node1        Node2            Node3            Node4
>>>>>                 convert lock_2 from
>>>>>                 PR to EX.
>>>>>
>>>>> the mode of lock_3 is
>>>>> PR, which blocks the
>>>>> conversion request of
>>>>> Node2. move lock_2 to
>>>>> conversion list.
>>>>>
>>>>>                              convert lock_3 from
>>>>>                              PR to EX.
>>>>>
>>>>> move lock_3 to conversion
>>>>> list. send BAST to Node3.
>>>>>
>>>>>                              receive BAST from Node1.
>>>>>                              downconvert thread execute
>>>>>                              canceling convert operation.
>>>>>
>>>>> Node2 dies because
>>>>> the host is powered down.
>>>>>
>>>>>                              in dlmunlock_common function,
>>>>>                              the downconvert thread set
>>>>>                              cancel_pending. at the same
>>>>>                              time, Node 3 realized that
>>>>>                              Node 1 is dead, so move lock_3
>>>>>                              back to granted list in
>>>>>                              dlm_move_lockres_to_recovery_list
>>>>>                              function and remove Node 1 from
>>>>>                              the domain_map in
>>>>>                              __dlm_hb_node_down function.
>>>>>                              then downconvert thread failed
>>>>>                              to send the lock cancellation
>>>>>                              request to Node1 and return
>>>>>                              DLM_NORMAL from
>>>>>                              dlm_send_remote_unlock_request
>>>>>                              function.
>>>>>
>>>>>                                                    become recovery master.
>>>>>
>>>>>                 during the recovery
>>>>>                 process, send
>>>>>                 lock_2 that is
>>>>>                 converting form
>>>>>                 PR to EX to Node4.
>>>>>
>>>>>                              during the recovery process,
>>>>>                              send lock_3 in the granted list and
>>>>>                              cantain the DLM_LKSB_GET_LVB
>>>>>                              flag to Node4. Then downconvert thread
>>>>>                              delete DLM_LKSB_GET_LVB flag in
>>>>>                              dlmunlock_common function.
>>>>>
>>>>>                                                    Node4 finish recovery.
>>>>>                                                    the mode of lock_3 is
>>>>>                                                    PR, which blocks the
>>>>>                                                    conversion request of
>>>>>                                                    Node2, so send BAST
>>>>>                                                    to Node3.
>>>>>
>>>>>                              receive BAST from Node4.
>>>>>                              convert lock_3 from PR to NL.
>>>>>
>>>>>                                                    change the mode of lock_3
>>>>>                                                    from PR to NL and send
>>>>>                                                    message to Node3.
>>>>>
>>>>>                              receive message from
>>>>>                              Node4. The message contain
>>>>>                              LKM_GET_LVB flag, but the
>>>>>                              lock->lksb->flags does not
>>>>>                              contain DLM_LKSB_GET_LVB,
>>>>>                              BUG_ON in dlm_proxy_ast_handler
>>>>>                              function.
>>>>>
>>>>> Signed-off-by: Jian Wang<wangjian161 at huawei.com>
>>>>> Reviewed-by: Yiwen Jiang<jiangyiwen at huawei.com>
>>>>> ---
>>>>>     fs/ocfs2/dlm/dlmunlock.c | 1 +
>>>>>     1 file changed, 1 insertion(+)
>>>>>
>>>>> diff --git a/fs/ocfs2/dlm/dlmunlock.c b/fs/ocfs2/dlm/dlmunlock.c
>>>>> index 63d701c..6e04fc7 100644
>>>>> --- a/fs/ocfs2/dlm/dlmunlock.c
>>>>> +++ b/fs/ocfs2/dlm/dlmunlock.c
>>>>> @@ -277,6 +277,7 @@ void dlm_commit_pending_cancel(struct dlm_lock_resource *res,
>>>>>     {
>>>>>     	list_move_tail(&lock->list, &res->granted);
>>>>>     	lock->ml.convert_type = LKM_IVMODE;
>>>>> +	lock->lksb->flags &= ~(DLM_LKSB_GET_LVB|DLM_LKSB_PUT_LVB);
>>>>>     }
>>>>>     
>>>>>     
>>>>> -- 
>>>>> 1.8.3.1
>>>>>
>>>> .
>>>>
>> .
>>