[Ocfs2-devel] discuss about jbd2 assertion in defragment path

Joseph Qi joseph.qi at linux.alibaba.com
Wed Feb 15 12:04:48 UTC 2023



On 2/15/23 6:02 PM, Heming Zhao wrote:
> On 2/15/23 2:42 PM, Joseph Qi wrote:
>>
>>
>> On 2/15/23 2:20 PM, Heming Zhao wrote:
>>> On 2/15/23 10:06 AM, Joseph Qi wrote:
>>>>
>>>>
>>>> On 2/14/23 7:48 PM, Heming Zhao wrote:
>>>>> On Tue, Feb 14, 2023 at 07:08:16PM +0800, Joseph Qi wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 2/14/23 12:33 PM, Heming Zhao wrote:
>>>>>>> On Tue, Feb 14, 2023 at 10:52:30AM +0800, Joseph Qi wrote:
>>>>>>>> Hi, Sorry about the late reply.
>>>>>>>> This thread is indeed a long time ago:(
>>>>>>>> It seems that I said the two ocfs2_journal_access_di() are for different
>>>>>>>> buffer head. Anyway, I have to recall the discussion before and get back
>>>>>>>> to you.
>>>>>>>>
>>>>>>> If you belive from [a] to [b.1] doesn't need any journal protection, my patch
>>>>>>> makes sense from theory.
>>>>>>>
>>>>>>>   From code logic, if the defrag path (ocfs2_split_extent) doesn't call
>>>>>>> jbd2_journal_restart(), current code is absolute correct. Increasing credits
>>>>>>> can't help avoid jbd2 assert crash, but make more easily to trigger crash,
>>>>>>> because it makes jbd2_journal_extend() more easily to return "status > 0". In my
>>>>>>> view, the correct fix should delete the access/dirty pair and make sure the
>>>>>>> ocfs2_split_extent() is journal self-service.
>>>>>>>
>>>>>>> Different buffer head issue could be handled by rechecking ocfs2_split_extent(),
>>>>>>> make sure all journal handle branches are correct. I already checked
>>>>>>> ocfs2_split_extent, but I can't assure my eyes didn't miss any corner.
>>>>>>
>>>>>> ocfs2_split_extent are not just 'read' operations, it may grow the tree,
>>>>>> that's why we have to do a transaction here.
>>>>>
>>>>> You misunderstand my meaning, my mean: from the caller __ocfs2_move_extent()
>>>>> calling ocfs2_journal_access_di() (below [a]) to (before) the handling
>>>>> ctxt.c_contig_type (below [b.1]) in ocfs2_split_extent, this area is purely read
>>>>> operations. It's the range [ [a], [b.1]) (include [a], not include [b.1])
>>>>>
>>>>> __ocfs2_move_extent
>>>>>    + ocfs2_journal_access_di //[a]   <------+
>>>>>    + ocfs2_split_extent      //[b]          | [a, b.1) area is read operations
>>>>>    |  + if                   //[b.1] <------+
>>>>>    |  |  A
>>>>>    |  |  +---- from this line, this func starts r/w IOs & needs jbd2 operations
>>>>>    |  |
>>>>>    |  |   ocfs2_replace_extent_rec()/ocfs2_split_and_insert()
>>>>>    |  + else
>>>>>    |      ocfs2_try_to_merge_extent ()
>>>>>    |
>>>>>    + ocfs2_journal_dirty     //[c]
>>>>>
>>>>>>
>>>>>> The whole code flow:
>>>>>> ocfs2_ioctl_move_extents
>>>>>>     ocfs2_move_extents
>>>>>>       __ocfs2_move_extents_range
>>>>>>         ocfs2_defrag_extent  [1]
>>>>>>           ocfs2_start_trans  [a]
>>>>>>           __ocfs2_move_extent
>>>>>>             ocfs2_journal_access_di
>>>>>>             ocfs2_split_extent
>>>>>>             ocfs2_journal_dirty
>>>>>>           ocfs2_commit_trans
>>>>>>         ocfs2_move_extent  [2]
>>>>>>           ocfs2_start_trans
>>>>>>           __ocfs2_move_extent
>>>>>>             ocfs2_journal_access_di
>>>>>>             ocfs2_split_extent
>>>>>>             ocfs2_journal_dirty
>>>>>>           ocfs2_commit_trans
>>>>>>       ocfs2_start_trans  [b]
>>>>>>       ocfs2_journal_access_di
>>>>>>       ocfs2_journal_dirty
>>>>>>       ocfs2_commit_trans
>>>>>>
>>>>>> In above, [a] and [b] are different transaction start/commit pair, and
>>>>>> each allocates different journal credits, even they are operate the same
>>>>>> bh, they are still different operations.
>>>>>
>>>>> I agree above writing, but it can't change my conclusion: ocfs2_split_extent()
>>>>> is journal self-service.
>>>>
>>>> Don't understand what does 'journal self-service' mean.
>>>
>>> Sorry for my English skill, I invent this word 'journal self-service'.
>>> The meaning is: this function could handle journal operations totally by itself.
>>> Caller doesn't need to call access/dirty pair. Caller only needs to call jbd2
>>> journal start/stop pair.
>>>
>>> Because ocfs2_start_trans() & ocfs2_commit_trans are reenterable function,
>>> we even could add these two func in ocfs2_split_extent(). Then any caller won't
>>> take care of any journal operations when calling ocfs2_split_extent().
>>>
>>>>
>>>>> It seems the author meaning: he wanted [a] to protect defragging actions (extents
>>>>> changes), wanted [b] to protect inode ctime changes. the
>>>>> ocfs2_start_trans/ocfs2_commit_trans in [a] area is necessary. but the
>>>>> access/dirty pair in __ocfs2_move_extent() is potential crash risk.
>>>>>
>>>> If so, what is transaction [a] protect? (no dirty buffer)
>>>
>>> Transaction [a] protects the dirty buffers in ocfs2_split_extent().
>>> ocfs2_split_extent() has already correctly used access/dirty pair to
>>> mark/handle dirty buffers.
>>>
>> Okay, I think I've gotten your idea now. You change looks reasonable.
>> Could you please also check if move extents without auto defrag also has
>> the same issue?
>>
> 
> I have finished code checking. ocfs2_move_extent() path has the same issue.
> My patch also works fine from this code branch.
> In theory, ocfs2_move_extent() has less opportunity to trigger crash, because
> userspace defrag.ocfs2 very likely set less than file size on
> 'struct ocfs2_move_extents : me_len', if defrag.ocfs2 would support this
> code branch.
> 

If we write a simple program to do move extents ioctl on the same ocfs2
(without auto defrag), can we trigger this crash?

> If you think my patch is acceptable, I will file v2 patch & plan only to change
> commit log in v2.
> 
Sure, with above confirmed.



More information about the Ocfs2-devel mailing list