[Ocfs2-devel] [PATCH] ocfs2: fix dlm lock migration crash
Junxiao Bi
junxiao.bi at oracle.com
Mon Feb 24 01:07:01 PST 2014
Hi,
On 07/19/2012 09:59 AM, Sunil Mushran wrote:
> Different issues.
>
> On Wed, Jul 18, 2012 at 6:34 PM, Junxiao Bi <junxiao.bi at oracle.com
> <mailto:junxiao.bi at oracle.com>> wrote:
>
> On 07/19/2012 12:36 AM, Sunil Mushran wrote:
>> This bug was detected during code audit. Never seen a crash. If
>> it does hit,
>> then we have bigger problems. So no point posting to stable.
>
I read a lot of dlm recovery code recently, I found this bug could
happen at the following scenario.
node 1: migrate target node x:
dlm_unregister_domain()
dlm_migrate_all_locks()
dlm_empty_lockres()
select node x as migrate target node
since there is a node x lock on the granted list.
dlm_migrate_lockres()
dlm_mark_lockres_migrating() {
wait_event(dlm->ast_wq, !dlm_lockres_is_dirty(dlm, res));
<<< node x unlock may happen here, res->granted list can be empty.
dlm_lockres_release_ast(dlm, res);
}
dlm_send_one_lockres()
dlm_process_recovery_data() {
tmpq is res->granted
list and is empty.
list_for_each_entry(lock, tmpq, list) {
if (lock->ml.cookie
!= ml->cookie)
lock = NULL;
else
break;
}
lock will be invalid
here.
if (lock->ml.node !=
ml->node)
BUG() --> crash
here.
}
Thanks,
Junxiao.
>
> Our customer can reproduce it. Also I saw you were assigned a
> similar bug before, see
> https://oss.oracle.com/bugzilla/show_bug.cgi?id=1220, is it the
> same BUG?
>>
>> On Tue, Jul 17, 2012 at 6:36 PM, Junxiao Bi
>> <junxiao.bi at oracle.com <mailto:junxiao.bi at oracle.com>> wrote:
>>
>> Hi Sunil,
>>
>> On 07/18/2012 03:49 AM, Sunil Mushran wrote:
>>> On Tue, Jul 17, 2012 at 12:10 AM, Junxiao Bi
>>> <junxiao.bi at oracle.com <mailto:junxiao.bi at oracle.com>> wrote:
>>>
>>> In the target node of the dlm lock migration, the logic
>>> to find
>>> the local dlm lock is wrong, it shouldn't change the
>>> loop variable
>>> "lock" in the list_for_each_entry loop. This will cause
>>> a NULL-pointer
>>> accessing crash.
>>>
>>> Signed-off-by: Junxiao Bi <junxiao.bi at oracle.com
>>> <mailto:junxiao.bi at oracle.com>>
>>> Cc: stable at vger.kernel.org <mailto:stable at vger.kernel.org>
>>> ---
>>> fs/ocfs2/dlm/dlmrecovery.c | 12 +++++++-----
>>> 1 file changed, 7 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/dlm/dlmrecovery.c
>>> b/fs/ocfs2/dlm/dlmrecovery.c
>>> index 01ebfd0..0b9cc88 100644
>>> --- a/fs/ocfs2/dlm/dlmrecovery.c
>>> +++ b/fs/ocfs2/dlm/dlmrecovery.c
>>> @@ -1762,6 +1762,7 @@ static int
>>> dlm_process_recovery_data(struct dlm_ctxt *dlm,
>>> u8 from = O2NM_MAX_NODES;
>>> unsigned int added = 0;
>>> __be64 c;
>>> + int found;
>>>
>>> mlog(0, "running %d locks for this lockres\n",
>>> mres->num_locks);
>>> for (i=0; i<mres->num_locks; i++) {
>>> @@ -1793,22 +1794,23 @@ static int
>>> dlm_process_recovery_data(struct dlm_ctxt *dlm,
>>> /* MIGRATION ONLY! */
>>> BUG_ON(!(mres->flags &
>>> DLM_MRES_MIGRATION));
>>>
>>> + found = 0;
>>> spin_lock(&res->spinlock);
>>> for (j = DLM_GRANTED_LIST; j <=
>>> DLM_BLOCKED_LIST; j++) {
>>> tmpq =
>>> dlm_list_idx_to_ptr(res, j);
>>>
>>> list_for_each_entry(lock, tmpq, list) {
>>> - if
>>> (lock->ml.cookie != ml->cookie)
>>> - lock = NULL;
>>> - else
>>> + if
>>> (lock->ml.cookie == ml->cookie) {
>>> + found = 1;
>>> break;
>>> + }
>>> }
>>> - if (lock)
>>> + if (found)
>>> break;
>>> }
>>>
>>> /* lock is always created
>>> locally first, and
>>> * destroyed locally last. it
>>> must be on the list */
>>> - if (!lock) {
>>> + if (!found) {
>>> c = ml->cookie;
>>> mlog(ML_ERROR, "Could
>>> not find local lock "
>>> "with
>>> cookie %u:%llu, node %u, "
>>>
>>>
>>>
>>> https://oss.oracle.com/git/?p=smushran/linux-2.6.git;a=blobdiff;f=fs/ocfs2/dlm/dlmrecovery.c;h=c881be6043a8c27c26ee44d217fb8ecf1eb37e02;hp=01ebfd0bdad72264b99345378f0c6febe246503d;hb=13279667cc8bbaf901591dee96f762d4aab8b307;hpb=a5ae0116eb56ec7c128e84fe15646a5cb9a8cb47
>>>
>>>
>>> We had decided to go back to list_for_each().
>>
>> OK, thank you. It's OK to revert it back for a introduced
>> bug. But I think you'd better cc stable branch.
>>
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20140224/f8596f59/attachment.html
More information about the Ocfs2-devel
mailing list