[Ocfs2-devel] [PATCH] add assertions to stuffs with purge/unused lockres

Sunil Mushran sunil.mushran at oracle.com
Wed Sep 2 22:09:42 PDT 2009


It's too intrusive for a debug patch. And big too. Maybe we need this  
eventually. But for right now, it will be easier if we can narrow down  
the scope of the issue as quickly as possible.

On Sep 2, 2009, at 8:21 PM, Wengang Wang <wen.gang.wang at oracle.com>  
wrote:

> Hi Sunil,
>
> Sunil Mushran wrote:
>> How about we narrow down the issue by dumping the lockres?
>> Look in 1.4. We dump the lockres in dlm_purge_lockres().
>> __dlm_print_one_lock_resource(res);
>> In this case, it appears the user has encountered it more than
>> once. Work with Srini to give them a package with the above.
>> The idea here is to figure out as to "how" the lockres is in use.
>> Is t because of a refmap. Or a new lock. etc.
>
> Yes, by dumping the lockres, we can know how the lockres is in use.
> however, we don't know how it becomes in use.
> I think with my patch, we can know how.
>
> so, what are the bad points with my patch?
>
> regards,
> wengang.
>
>> Wengang Wang wrote:
>>> this is more a discusion than a patch.
>>>
>>> I had an ocfs2 bug(8801298). the problem is that with dlm- 
>>> >spinlock hold,
>>> dlm_run_purge_list() think the status of "unuse" of a lockres  
>>> shouldn't change
>>> from unused to in-use. it checks the status twice(by calling  
>>> __dlm_lockres_unused())
>>> and found the status changed from unused to in-use then calls a BUG 
>>> () to panic.
>>> the only avaible info is just the BUG() info. however there are  
>>> serveral possibility
>>> casing the status change. so I stuck there -- I am not able to go  
>>> any further..
>>>
>>> If we can detect the problem in each possibility, it will be  
>>> better. so I wrote
>>> the patch. the patch does:
>>> 1) removes the lockres from purge list(if it's in the list) in  
>>> __dlm_lookup_lockres_full().
>>>   --if found in __dlm_lookup_lockres_full(), the lockres is going  
>>> to be in-use
>>>     very soon, so remove it from purge list.
>>>
>>> 2) encapsulates operations that adds/removes/moves dlm_lock to/ 
>>> from granted/
>>>   converting/blocked lists of a lockres into functions. in the  
>>> functions, there
>>>   are assertions that check mainly if the lockres is not in purge  
>>> list.
>>>   --it can detect the 8801298 BUG ealier.
>>>
>>> 3) encapsulates operations that clear/set refmap_bit into  
>>> functions which does
>>>   same assertion as in 2).
>>>   --it can detect the 8801298 BUG ealier.
>>>
>>> 4) encapsulates operations that adds/removes/re-adds lockres to/ 
>>> from dlm->purge_list
>>>   into functions that does assertions as in 2)
>>>   --it can detect the 8801298 BUG ealier.
>>>
>>> 5) encapsulates operations that adds/removes lockres to/from dlm- 
>>> >dirty_list.
>>>   into functions that does assertions as in 2)
>>>   --it can detect the 8801298 BUG ealier.
>>>
>>> 6) what I think they could be bugs
>>>   6.1) adds spinlock protection on the operation that remove  
>>> lockres from purgelist.
>>>   6.2) moves the two operation
>>>               a) removes lockres from dirty list;
>>>               b) remove DLM_LOCK_RES_DIRTY flag from the lockres
>>>        into one atomic operation(in protection of res->spinlock).
>>>        --I think both checking list_emtry(&res->dirty) and  
>>> DLM_LOCK_RES_DIRTY
>>>          is ugly. if the above is reasonable, maybe we can remove  
>>> the DLM_LOCK_RES_DIRTY
>>>          flag later..
>>>
>>> for 2), 4) and 5), I don't know if it's a good idea --developers  
>>> maybe still
>>> using the original list_add_tail()/list_del_init ..
>>> for 6), maybe I should make separate patches..
>>>
>>> Signed-off-by: Wengang Wang <wen.gang.wang at oracle.com>



More information about the Ocfs2-devel mailing list