<div dir="ltr">True. The function could do with a little bit of cleanup. Feel free to send a patch.<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Sun, May 19, 2013 at 7:49 PM, Joseph Qi <span dir="ltr">&lt;<a href="mailto:joseph.qi@huawei.com" target="_blank">joseph.qi@huawei.com</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On 2013/5/19 10:25, Joseph Qi wrote:<br>

&gt; On 2013/5/18 21:26, Sunil Mushran wrote:<br>

&gt;&gt; The first node that gets the lock will do the actual recovery. The others will get the lock and see a clean journal and skip the recovery. A thread should never error out if it fails to get the lock. It should try and try again.<br>


&gt;&gt;<br>

&gt;&gt; On May 17, 2013, at 11:27 PM, Joseph Qi &lt;<a href="mailto:joseph.qi@huawei.com">joseph.qi@huawei.com</a>&gt; wrote:<br>

&gt;&gt;<br>

&gt;&gt;&gt; Hi,<br>

&gt;&gt;&gt; Once there is node down in the cluster, ocfs2_recovery_thread will be<br>

&gt;&gt;&gt; triggered on each node. These threads then do the down node recovery by<br>

&gt;&gt;&gt; get super lock.<br>

&gt;&gt;&gt; I have several questions on this:<br>

&gt;&gt;&gt; 1) Why each node has to run such a thread? We know at last one node can<br>

&gt;&gt;&gt; get the super lock and do the actual recovery.<br>

&gt;&gt;&gt; 2) If this thread is running but something error occurred, take<br>

&gt;&gt;&gt; ocfs2_super_lock failed for example, the thread will exit without<br>

&gt;&gt;&gt; clearing recovery map, will it cause other threads still waiting for<br>

&gt;&gt;&gt; recovery in ocfs2_wait_for_recovery?<br>

&gt;&gt;&gt;<br>

&gt;&gt;<br>

&gt;&gt;<br>

&gt; But when error occurs and goes to bail, and the restart logic will not<br>

&gt; run. Codes like below:<br>

&gt; ...<br>

&gt;       status = ocfs2_wait_on_mount(osb);<br>

&gt;       if (status &lt; 0) {<br>

&gt;               goto bail;<br>

&gt;       }<br>

&gt;<br>

&gt;       rm_quota = kzalloc(osb-&gt;max_slots * sizeof(int), GFP_NOFS);<br>

&gt;       if (!rm_quota) {<br>

&gt;               status = -ENOMEM;<br>

&gt;               goto bail;<br>

&gt;       }<br>

&gt; restart:<br>

&gt;       status = ocfs2_super_lock(osb, 1);<br>

&gt;       if (status &lt; 0) {<br>

&gt;               mlog_errno(status);<br>

&gt;               goto bail;<br>

&gt;       }<br>

&gt; ...<br>

&gt;       if (!status &amp;&amp; !ocfs2_recovery_completed(osb)) {<br>

&gt;               mutex_unlock(&amp;osb-&gt;recovery_lock);<br>

&gt;               goto restart;<br>

&gt;       }<br>

&gt;<br>

&gt;<br>

</div></div>&gt; _______________________________________________<br>

&gt; Ocfs2-devel mailing list<br>

&gt; <a href="mailto:Ocfs2-devel@oss.oracle.com">Ocfs2-devel@oss.oracle.com</a><br>

&gt; <a href="https://oss.oracle.com/mailman/listinfo/ocfs2-devel" target="_blank">https://oss.oracle.com/mailman/listinfo/ocfs2-devel</a><br>

&gt;<br>

&gt;<br>

One more question, do we make sure dlm_recovery_thread always prior to<br>

ocfs2_recovery_thread?<br>

<br>

</blockquote></div><br></div>