[Ocfs2-devel] ocfs2: Question for ocfs2_recovery_thread

Sun May 19 19:49:11 PDT 2013

On 2013/5/19 10:25, Joseph Qi wrote:
> On 2013/5/18 21:26, Sunil Mushran wrote:
>> The first node that gets the lock will do the actual recovery. The others will get the lock and see a clean journal and skip the recovery. A thread should never error out if it fails to get the lock. It should try and try again.
>>
>> On May 17, 2013, at 11:27 PM, Joseph Qi <joseph.qi at huawei.com> wrote:
>>
>>> Hi,
>>> Once there is node down in the cluster, ocfs2_recovery_thread will be
>>> triggered on each node. These threads then do the down node recovery by
>>> get super lock.
>>> I have several questions on this:
>>> 1) Why each node has to run such a thread? We know at last one node can
>>> get the super lock and do the actual recovery.
>>> 2) If this thread is running but something error occurred, take
>>> ocfs2_super_lock failed for example, the thread will exit without
>>> clearing recovery map, will it cause other threads still waiting for
>>> recovery in ocfs2_wait_for_recovery?
>>>
>>
>>
> But when error occurs and goes to bail, and the restart logic will not
> run. Codes like below:
> ...
> 	status = ocfs2_wait_on_mount(osb);
> 	if (status < 0) {
> 		goto bail;
> 	}
> 
> 	rm_quota = kzalloc(osb->max_slots * sizeof(int), GFP_NOFS);
> 	if (!rm_quota) {
> 		status = -ENOMEM;
> 		goto bail;
> 	}
> restart:
> 	status = ocfs2_super_lock(osb, 1);
> 	if (status < 0) {
> 		mlog_errno(status);
> 		goto bail;
> 	}
> ...
> 	if (!status && !ocfs2_recovery_completed(osb)) {
> 		mutex_unlock(&osb->recovery_lock);
> 		goto restart;
> 	}
> 
> 
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 
> 
One more question, do we make sure dlm_recovery_thread always prior to
ocfs2_recovery_thread?