<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=UTF-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <pre>In the dlm_move_lockres_to_recovery_list function, if the lock
is in the granted queue and cancel_pending is set, it will
encounter a BUG. I think this is a meaningless BUG,
so be prepared to remove it. A scenario that causes
this BUG will be given below.

At the beginning, Node 1 is the master and has NL lock,
Node 2 has PR lock, Node 3 has PR lock too.

Node 1          Node 2          Node 3
            want to get EX lock.

                            want to get EX lock.

Node 3 hinder
Node 2 to get
EX lock, send
Node 3 a BAST.

                            receive BAST from
                            Node 1. downconvert
                            thread begin to
                            cancel PR to EX conversion.
                            In dlmunlock_common function,
                            downconvert thread has set
                            lock-&gt;cancel_pending,
                            but did not enter
                            dlm_send_remote_unlock_request
                            function.

            Node2 dies because
            the host is powered down.

In recovery process,
clean the lock that
related to Node2.
then finish Node 3
PR to EX request.
give Node 3 a AST.

                            receive AST from Node 1.
                            change lock level to EX,
                            move lock to granted list.

Node1 dies because
the host is powered down.

                            In dlm_move_lockres_to_recovery_list
                            function. the lock is in the
                            granted queue and cancel_pending
                            is set. BUG_ON.

But after clearing this BUG, process will encounter
the second BUG in the ocfs2_unlock_ast function.
Here is a scenario that will cause the second BUG
in ocfs2_unlock_ast as follows:

At the beginning, Node 1 is the master and has NL lock,
Node 2 has PR lock, Node 3 has PR lock too.

Node 1          Node 2          Node 3
            want to get EX lock.

                            want to get EX lock.

Node 3 hinder
Node 2 to get
EX lock, send
Node 3 a BAST.

                            receive BAST from
                            Node 1. downconvert
                            thread begin to
                            cancel PR to EX conversion.
                            In dlmunlock_common function,
                            downconvert thread has released
                            lock-&gt;spinlock and res-&gt;spinlock,
                            but did not enter
                            dlm_send_remote_unlock_request
                            function.

            Node2 dies because
            the host is powered down.

In recovery process,
clean the lock that
related to Node2.
then finish Node 3
PR to EX request.
give Node 3 a AST.

                            receive AST from Node 1.
                            change lock level to EX,
                            move lock to granted list,
                            set lockres-&gt;l_unlock_action
                            as OCFS2_UNLOCK_INVALID
                            in ocfs2_locking_ast function.

Node2 dies because
the host is powered down.

                            Node 3 realize that Node 1
                            is dead, remove Node 1 from
                            domain_map. downconvert thread
                            get DLM_NORMAL from
                            dlm_send_remote_unlock_request
                            function and set *call_ast as 1.
                            Then downconvert thread meet
                            BUG in ocfs2_unlock_ast function.

To avoid meet the second BUG, function dlmunlock_common shuold
return DLM_CANCELGRANT if the lock is on granted list and
the operation is canceled.

Signed-off-by: Jian Wang <a class="moz-txt-link-rfc2396E" href="mailto:wangjian161@huawei.com">&lt;wangjian161@huawei.com&gt;</a>
Reviewed-by: Yiwen Jiang <a class="moz-txt-link-rfc2396E" href="mailto:jiangyiwen@huawei.com">&lt;jiangyiwen@huawei.com&gt;</a>
---
 fs/ocfs2/dlm/dlmrecovery.c | 1 -
 fs/ocfs2/dlm/dlmunlock.c   | 5 +++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c
index 802636d..7489652 100644
--- a/fs/ocfs2/dlm/dlmrecovery.c
+++ b/fs/ocfs2/dlm/dlmrecovery.c
@@ -2134,7 +2134,6 @@ void dlm_move_lockres_to_recovery_list(struct dlm_ctxt *dlm,
                                  * if this had completed successfully
                                  * before sending this lock state to the
                                  * new master */
-                                BUG_ON(i != DLM_CONVERTING_LIST);
                                 mlog(0, "node died with cancel pending "
                                      "on %.*s. move back to granted list.\n",
                                      res-&gt;lockname.len, res-&gt;lockname.name);
diff --git a/fs/ocfs2/dlm/dlmunlock.c b/fs/ocfs2/dlm/dlmunlock.c
index 63d701c..505bb6c 100644
--- a/fs/ocfs2/dlm/dlmunlock.c
+++ b/fs/ocfs2/dlm/dlmunlock.c
@@ -183,6 +183,11 @@ static enum dlm_status dlmunlock_common(struct dlm_ctxt *dlm,
                                                         flags, owner);
                 spin_lock(&amp;res-&gt;spinlock);
                 spin_lock(&amp;lock-&gt;spinlock);
+
+                if ((flags &amp; LKM_CANCEL) &amp;&amp;
+                                dlm_lock_on_list(&amp;res-&gt;granted, lock))
+                        status = DLM_CANCELGRANT;
+
                 /* if the master told us the lock was already granted,
                  * let the ast handle all of these actions */
                 if (status == DLM_CANCELGRANT) {
-- 
1.8.3.1
</pre>
  </body>
</html>