<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
On 05/30/2012 06:09 AM, Sunil Mushran wrote:
<blockquote
cite="mid:CAEeiSHXcaKXi7Qm5vLBmTp2CjiB7DCrUee5qmr03YpuJbzP5yg@mail.gmail.com"
type="cite">On Thu, May 24, 2012 at 10:53 PM, <span dir="ltr"><<a
moz-do-not-send="true" href="mailto:xiaowei.hu@oracle.com"
target="_blank">xiaowei.hu@oracle.com</a>></span> wrote:<br>
<div class="gmail_quote">
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
diff --git a/fs/ocfs2/dlm/dlmrecovery.c
b/fs/ocfs2/dlm/dlmrecovery.c<br>
index 01ebfd0..62659e8 100644<br>
--- a/fs/ocfs2/dlm/dlmrecovery.c<br>
+++ b/fs/ocfs2/dlm/dlmrecovery.c<br>
@@ -555,6 +555,7 @@ static int dlm_remaster_locks(struct
dlm_ctxt *dlm, u8 dead_node)<br>
int all_nodes_done;<br>
int destroy = 0;<br>
int pass = 0;<br>
+ int dying = 0;<br>
<br>
do {<br>
/* we have become recovery master. there is no
escaping<br>
@@ -659,6 +660,7 @@ static int dlm_remaster_locks(struct
dlm_ctxt *dlm, u8 dead_node)<br>
list_for_each_entry(ndata,
&dlm->reco.node_data, list) {<br>
mlog(0, "checking recovery state of
node %u\n",<br>
ndata->node_num);<br>
+ dying = 0;<br>
switch (ndata->state) {<br>
case DLM_RECO_NODE_DATA_INIT:<br>
case
DLM_RECO_NODE_DATA_REQUESTING:<br>
@@ -679,6 +681,13 @@ static int dlm_remaster_locks(struct
dlm_ctxt *dlm, u8 dead_node)<br>
dlm->name,
ndata->node_num,<br>
ndata->state==DLM_RECO_NODE_DATA_RECEIVING ?<br>
"receiving" :
"requested");<br>
+
spin_lock(&dlm->spinlock);<br>
+ dying =
!test_bit(ndata->node_num, dlm->live_nodes_map);<br>
+
spin_unlock(&dlm->spinlock);<br>
+ if (dying) {<br>
+
ndata->state = DLM_RECO_NODE_DATA_DEAD;<br>
+ break;<br>
+ }<br>
</blockquote>
<div><br>
<br>
<br>
<br>
I would suggest exploring adding this in dlm hb down event.
Checking live map all<br>
over the place is hacky. We do it more than we should right
now. Let's not add to the<br>
mess.<br>
</div>
</div>
</blockquote>
HI Sunil,<br>
<br>
Do you mean we should clear the bit in domain map in dlm hb down
event directly when the node down <br>
and check with dlm_is_node_dead at here?<br>
Or how could we explore and ensure the node is alive during the
whole migrate process?One node could die even after it sends out one
locks package and before the next if there were too many locks on
that lockres.<br>
<br>
Thanks,<br>
Xiaowei<br>
<blockquote
cite="mid:CAEeiSHXcaKXi7Qm5vLBmTp2CjiB7DCrUee5qmr03YpuJbzP5yg@mail.gmail.com"
type="cite">
<div class="gmail_quote">
<div><br>
<br>
<br>
</div>
<blockquote class="gmail_quote" style="margin:0pt 0pt 0pt
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
all_nodes_done = 0;<br>
break;<br>
case DLM_RECO_NODE_DATA_DONE:<br>
<span class="HOEnZb"><font color="#888888">--<br>
1.7.7.6<br>
<br>
<br>
_______________________________________________<br>
Ocfs2-devel mailing list<br>
<a moz-do-not-send="true"
href="mailto:Ocfs2-devel@oss.oracle.com">Ocfs2-devel@oss.oracle.com</a><br>
<a moz-do-not-send="true"
href="http://oss.oracle.com/mailman/listinfo/ocfs2-devel"
target="_blank">http://oss.oracle.com/mailman/listinfo/ocfs2-devel</a><br>
</font></span></blockquote>
</div>
<br>
</blockquote>
<br>
</body>
</html>