<div dir="ltr">True. The function could do with a little bit of cleanup. Feel free to send a patch.<br></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Sun, May 19, 2013 at 7:49 PM, Joseph Qi <span dir="ltr"><<a href="mailto:joseph.qi@huawei.com" target="_blank">joseph.qi@huawei.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On 2013/5/19 10:25, Joseph Qi wrote:<br>
> On 2013/5/18 21:26, Sunil Mushran wrote:<br>
>> The first node that gets the lock will do the actual recovery. The others will get the lock and see a clean journal and skip the recovery. A thread should never error out if it fails to get the lock. It should try and try again.<br>
>><br>
>> On May 17, 2013, at 11:27 PM, Joseph Qi <<a href="mailto:joseph.qi@huawei.com">joseph.qi@huawei.com</a>> wrote:<br>
>><br>
>>> Hi,<br>
>>> Once there is node down in the cluster, ocfs2_recovery_thread will be<br>
>>> triggered on each node. These threads then do the down node recovery by<br>
>>> get super lock.<br>
>>> I have several questions on this:<br>
>>> 1) Why each node has to run such a thread? We know at last one node can<br>
>>> get the super lock and do the actual recovery.<br>
>>> 2) If this thread is running but something error occurred, take<br>
>>> ocfs2_super_lock failed for example, the thread will exit without<br>
>>> clearing recovery map, will it cause other threads still waiting for<br>
>>> recovery in ocfs2_wait_for_recovery?<br>
>>><br>
>><br>
>><br>
> But when error occurs and goes to bail, and the restart logic will not<br>
> run. Codes like below:<br>
> ...<br>
> status = ocfs2_wait_on_mount(osb);<br>
> if (status < 0) {<br>
> goto bail;<br>
> }<br>
><br>
> rm_quota = kzalloc(osb->max_slots * sizeof(int), GFP_NOFS);<br>
> if (!rm_quota) {<br>
> status = -ENOMEM;<br>
> goto bail;<br>
> }<br>
> restart:<br>
> status = ocfs2_super_lock(osb, 1);<br>
> if (status < 0) {<br>
> mlog_errno(status);<br>
> goto bail;<br>
> }<br>
> ...<br>
> if (!status && !ocfs2_recovery_completed(osb)) {<br>
> mutex_unlock(&osb->recovery_lock);<br>
> goto restart;<br>
> }<br>
><br>
><br>
</div></div>> _______________________________________________<br>
> Ocfs2-devel mailing list<br>
> <a href="mailto:Ocfs2-devel@oss.oracle.com">Ocfs2-devel@oss.oracle.com</a><br>
> <a href="https://oss.oracle.com/mailman/listinfo/ocfs2-devel" target="_blank">https://oss.oracle.com/mailman/listinfo/ocfs2-devel</a><br>
><br>
><br>
One more question, do we make sure dlm_recovery_thread always prior to<br>
ocfs2_recovery_thread?<br>
<br>
</blockquote></div><br></div>