<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Junxiao, thanks for looking into this issue. Please see my comment
below<br>
<br>
On 02/24/2014 01:07 AM, Junxiao Bi wrote:<br>
<blockquote cite="mid:530B0BB5.90600@oracle.com" type="cite">
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
<div class="moz-cite-prefix">Hi,<br>
<br>
On 07/19/2012 09:59 AM, Sunil Mushran wrote:<br>
</div>
<blockquote
cite="mid:CAEeiSHV+TVsnwqnsi0u4r=ucBoddo8wD8DcqbsCn1UoA3xjtdg@mail.gmail.com"
type="cite">Different issues.<br>
<br>
<div class="gmail_quote">On Wed, Jul 18, 2012 at 6:34 PM,
Junxiao Bi <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:junxiao.bi@oracle.com" target="_blank">junxiao.bi@oracle.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>On 07/19/2012 12:36 AM, Sunil Mushran wrote:<br>
</div>
<blockquote type="cite">
<div>This bug was detected during code audit. Never seen
a crash. If it does hit,</div>
<div>then we have bigger problems. So no point posting
to stable.</div>
</blockquote>
</div>
</blockquote>
</div>
</blockquote>
I read a lot of dlm recovery code recently, I found this bug could
happen at the following scenario.<br>
<br>
node 1: migrate target
node x:<br>
dlm_unregister_domain()<br>
dlm_migrate_all_locks()<br>
dlm_empty_lockres()<br>
select node x as migrate target node<br>
since there is a node x lock on the granted list.<br>
dlm_migrate_lockres()<br>
dlm_mark_lockres_migrating() {<br>
wait_event(dlm->ast_wq, !dlm_lockres_is_dirty(dlm, res));<br>
<<< node x unlock may happen here, res->granted list
can be empty.<br>
</blockquote>
If the unlock request got sent at this point, and if the request was
*processed*, lock must have been removed from the granted_list. If
the request was *not yet processed*, then the DLM_LOCK_RES_MIGRATING
set in dlm_lockres_release_ast would make dlm_unlock handler to
return DLM_MIGRATING to the caller (in this case node x). So I don't
see how granted_list could have stale lock. Am I missing something ?<br>
<br>
I do think there is such race that you pointed below exist, but I am
not sure if it was due to the above race described.<br>
<br>
<blockquote cite="mid:530B0BB5.90600@oracle.com" type="cite">
dlm_lockres_release_ast(dlm, res);<br>
} <br>
dlm_send_one_lockres()<br>
dlm_process_recovery_data() {<br>
tmpq is
res->granted list and is empty.<br>
list_for_each_entry(lock, tmpq, list) {<br>
if
(lock->ml.cookie != ml->cookie)<br>
lock = NULL;<br>
else <br>
break;<br>
} <br>
lock will be
invalid here.<br>
if
(lock->ml.node != ml->node)<br>
BUG()
--> crash here.<br>
}<br>
<br>
Thanks,<br>
Junxiao.<br>
<blockquote
cite="mid:CAEeiSHV+TVsnwqnsi0u4r=ucBoddo8wD8DcqbsCn1UoA3xjtdg@mail.gmail.com"
type="cite">
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<blockquote type="cite"> </blockquote>
Our customer can reproduce it. Also I saw you were
assigned a similar bug before, see <a
moz-do-not-send="true"
href="https://oss.oracle.com/bugzilla/show_bug.cgi?id=1220"
target="_blank">https://oss.oracle.com/bugzilla/show_bug.cgi?id=1220</a>,
is it the same BUG?<br>
<blockquote type="cite"><br>
<div class="gmail_quote">On Tue, Jul 17, 2012 at 6:36
PM, Junxiao Bi <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:junxiao.bi@oracle.com"
target="_blank">junxiao.bi@oracle.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000">
<div>Hi Sunil,<br>
<br>
On 07/18/2012 03:49 AM, Sunil Mushran wrote:<br>
</div>
<blockquote type="cite">
<div>
<div class="gmail_quote">On Tue, Jul 17, 2012
at 12:10 AM, Junxiao Bi <span dir="ltr"><<a
moz-do-not-send="true"
href="mailto:junxiao.bi@oracle.com"
target="_blank">junxiao.bi@oracle.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote"
style="margin:0 0 0 .8ex;border-left:1px
#ccc solid;padding-left:1ex"> In the
target node of the dlm lock migration, the
logic to find<br>
the local dlm lock is wrong, it shouldn't
change the loop variable<br>
"lock" in the list_for_each_entry loop.
This will cause a NULL-pointer<br>
accessing crash.<br>
<br>
Signed-off-by: Junxiao Bi <<a
moz-do-not-send="true"
href="mailto:junxiao.bi@oracle.com"
target="_blank">junxiao.bi@oracle.com</a>><br>
Cc: <a moz-do-not-send="true"
href="mailto:stable@vger.kernel.org"
target="_blank">stable@vger.kernel.org</a><br>
---<br>
fs/ocfs2/dlm/dlmrecovery.c | 12
+++++++-----<br>
1 file changed, 7 insertions(+), 5
deletions(-)<br>
<br>
diff --git a/fs/ocfs2/dlm/dlmrecovery.c
b/fs/ocfs2/dlm/dlmrecovery.c<br>
index 01ebfd0..0b9cc88 100644<br>
--- a/fs/ocfs2/dlm/dlmrecovery.c<br>
+++ b/fs/ocfs2/dlm/dlmrecovery.c<br>
@@ -1762,6 +1762,7 @@ static int
dlm_process_recovery_data(struct dlm_ctxt
*dlm,<br>
u8 from = O2NM_MAX_NODES;<br>
unsigned int added = 0;<br>
__be64 c;<br>
+ int found;<br>
<br>
mlog(0, "running %d locks for this
lockres\n", mres->num_locks);<br>
for (i=0; i<mres->num_locks;
i++) {<br>
@@ -1793,22 +1794,23 @@ static int
dlm_process_recovery_data(struct dlm_ctxt
*dlm,<br>
/* MIGRATION ONLY!
*/<br>
BUG_ON(!(mres->flags &
DLM_MRES_MIGRATION));<br>
<br>
+ found = 0;<br>
spin_lock(&res->spinlock);<br>
for (j =
DLM_GRANTED_LIST; j <=
DLM_BLOCKED_LIST; j++) {<br>
tmpq =
dlm_list_idx_to_ptr(res, j);<br>
list_for_each_entry(lock, tmpq, list) {<br>
- if
(lock->ml.cookie != ml->cookie)<br>
-
lock = NULL;<br>
-
else<br>
+ if
(lock->ml.cookie == ml->cookie) {<br>
+
found = 1;<br>
break;<br>
+ }<br>
}<br>
- if (lock)<br>
+ if (found)<br>
break;<br>
}<br>
<br>
/* lock is always
created locally first, and<br>
* destroyed
locally last. it must be on the list */<br>
- if (!lock) {<br>
+ if (!found) {<br>
c =
ml->cookie;<br>
mlog(ML_ERROR, "Could not find local lock
"<br>
"with cookie %u:%llu, node %u, "<br>
</blockquote>
<div><br>
</div>
<div><br>
</div>
<a moz-do-not-send="true"
href="https://oss.oracle.com/git/?p=smushran/linux-2.6.git;a=blobdiff;f=fs/ocfs2/dlm/dlmrecovery.c;h=c881be6043a8c27c26ee44d217fb8ecf1eb37e02;hp=01ebfd0bdad72264b99345378f0c6febe246503d;hb=13279667cc8bbaf901591dee96f762d4aab8b307;hpb=a5ae0116eb56ec7c128e84fe15646a5cb9a8cb47"
target="_blank">https://oss.oracle.com/git/?p=smushran/linux-2.6.git;a=blobdiff;f=fs/ocfs2/dlm/dlmrecovery.c;h=c881be6043a8c27c26ee44d217fb8ecf1eb37e02;hp=01ebfd0bdad72264b99345378f0c6febe246503d;hb=13279667cc8bbaf901591dee96f762d4aab8b307;hpb=a5ae0116eb56ec7c128e84fe15646a5cb9a8cb47</a>
<div> <br>
</div>
<div>We had decided to go back to
list_for_each().</div>
</div>
</div>
</blockquote>
<br>
OK, thank you. It's OK to revert it back for a
introduced bug. But I think you'd better cc stable
branch.<br>
</div>
</blockquote>
</div>
<br>
</blockquote>
<br>
<br>
</div>
</blockquote>
</div>
<br>
</blockquote>
<br>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Ocfs2-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Ocfs2-devel@oss.oracle.com">Ocfs2-devel@oss.oracle.com</a>
<a class="moz-txt-link-freetext" href="https://oss.oracle.com/mailman/listinfo/ocfs2-devel">https://oss.oracle.com/mailman/listinfo/ocfs2-devel</a></pre>
</blockquote>
<br>
</body>
</html>