[Ocfs2-devel] [PATCH] add assertions to stuffs with purge/unused lockres
Wengang Wang
wen.gang.wang at oracle.com
Wed Sep 2 20:21:11 PDT 2009
Hi Sunil,
Sunil Mushran wrote:
> How about we narrow down the issue by dumping the lockres?
>
> Look in 1.4. We dump the lockres in dlm_purge_lockres().
> __dlm_print_one_lock_resource(res);
>
> In this case, it appears the user has encountered it more than
> once. Work with Srini to give them a package with the above.
>
> The idea here is to figure out as to "how" the lockres is in use.
> Is t because of a refmap. Or a new lock. etc.
Yes, by dumping the lockres, we can know how the lockres is in use.
however, we don't know how it becomes in use.
I think with my patch, we can know how.
so, what are the bad points with my patch?
regards,
wengang.
> Wengang Wang wrote:
>> this is more a discusion than a patch.
>>
>> I had an ocfs2 bug(8801298). the problem is that with dlm->spinlock hold,
>> dlm_run_purge_list() think the status of "unuse" of a lockres
>> shouldn't change
>> from unused to in-use. it checks the status twice(by calling
>> __dlm_lockres_unused())
>> and found the status changed from unused to in-use then calls a BUG()
>> to panic.
>> the only avaible info is just the BUG() info. however there are
>> serveral possibility
>> casing the status change. so I stuck there -- I am not able to go any
>> further..
>>
>> If we can detect the problem in each possibility, it will be better.
>> so I wrote
>> the patch. the patch does:
>> 1) removes the lockres from purge list(if it's in the list) in
>> __dlm_lookup_lockres_full().
>> --if found in __dlm_lookup_lockres_full(), the lockres is going to
>> be in-use
>> very soon, so remove it from purge list.
>>
>> 2) encapsulates operations that adds/removes/moves dlm_lock to/from
>> granted/
>> converting/blocked lists of a lockres into functions. in the
>> functions, there
>> are assertions that check mainly if the lockres is not in purge list.
>> --it can detect the 8801298 BUG ealier.
>>
>> 3) encapsulates operations that clear/set refmap_bit into functions
>> which does
>> same assertion as in 2).
>> --it can detect the 8801298 BUG ealier.
>>
>> 4) encapsulates operations that adds/removes/re-adds lockres to/from
>> dlm->purge_list
>> into functions that does assertions as in 2)
>> --it can detect the 8801298 BUG ealier.
>>
>> 5) encapsulates operations that adds/removes lockres to/from
>> dlm->dirty_list.
>> into functions that does assertions as in 2)
>> --it can detect the 8801298 BUG ealier.
>>
>> 6) what I think they could be bugs
>> 6.1) adds spinlock protection on the operation that remove lockres
>> from purgelist.
>> 6.2) moves the two operation
>> a) removes lockres from dirty list;
>> b) remove DLM_LOCK_RES_DIRTY flag from the lockres
>> into one atomic operation(in protection of res->spinlock).
>> --I think both checking list_emtry(&res->dirty) and
>> DLM_LOCK_RES_DIRTY
>> is ugly. if the above is reasonable, maybe we can remove the
>> DLM_LOCK_RES_DIRTY
>> flag later..
>>
>> for 2), 4) and 5), I don't know if it's a good idea --developers maybe
>> still
>> using the original list_add_tail()/list_del_init ..
>> for 6), maybe I should make separate patches..
>>
>> Signed-off-by: Wengang Wang <wen.gang.wang at oracle.com>
More information about the Ocfs2-devel
mailing list