[Ocfs2-devel] [RFC] Doubt about dlm_worker

Junxiao Bi junxiao.bi at oracle.com
Thu Sep 10 19:19:51 PDT 2015


On 09/10/2015 07:49 PM, Joseph Qi wrote:
> Hi Junxiao & Sunil,
> Your comments would be appreciated.
> 
> Thanks,
> Joseph
> 
> On 2015/9/6 21:11, Joseph Qi wrote:
>> Comments for dlm_dispatch_work is described below:
>> /* Worker function used during recovery. */
>>
>> But actually dlm_worker is used by 4 types of dlm message workers:
>> 	dlm_assert_master_worker
>> 	dlm_deref_lockres_worker
>> 	dlm_request_all_locks_worker
>> 	dlm_mig_lockres_worker
>>
>> And the first 2 are not dlm recovery related. Moreover, it will send
>> DLM_ASSERT_MASTER_MSG to all other nodes in dlm_assert_master_worker.
>> And it may do a lot of assert master during recovery. In our scenario,
>> it is tens of thousands.
>> This will delay the recovery because dlm_worker is a single thread
>> workqueue and cluster is hanging during dlm recovery.
>> So I doubt if we can move the assert master to a new workqueue or just
>> use a system workqueue.
>> Any suggestions?
Take a look at the code and didn't see an obvious need that these four
worker should be run in order and they use locks to protect. So i think
it's OK to split it out. But better do a good test to avoid this unhide
some bug.

Thanks,
Junxiao.
>>
>>
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>
>>
> 
> 




More information about the Ocfs2-devel mailing list