[Ocfs2-devel] discuss about jbd2 assertion in defragment path

Heming Zhao heming.zhao at suse.com
Wed Feb 15 10:02:44 UTC 2023


On 2/15/23 2:42 PM, Joseph Qi wrote:
> 
> 
> On 2/15/23 2:20 PM, Heming Zhao wrote:
>> On 2/15/23 10:06 AM, Joseph Qi wrote:
>>>
>>>
>>> On 2/14/23 7:48 PM, Heming Zhao wrote:
>>>> On Tue, Feb 14, 2023 at 07:08:16PM +0800, Joseph Qi wrote:
>>>>> Hi,
>>>>>
>>>>> On 2/14/23 12:33 PM, Heming Zhao wrote:
>>>>>> On Tue, Feb 14, 2023 at 10:52:30AM +0800, Joseph Qi wrote:
>>>>>>> Hi, Sorry about the late reply.
>>>>>>> This thread is indeed a long time ago:(
>>>>>>> It seems that I said the two ocfs2_journal_access_di() are for different
>>>>>>> buffer head. Anyway, I have to recall the discussion before and get back
>>>>>>> to you.
>>>>>>>
>>>>>> If you belive from [a] to [b.1] doesn't need any journal protection, my patch
>>>>>> makes sense from theory.
>>>>>>
>>>>>>   From code logic, if the defrag path (ocfs2_split_extent) doesn't call
>>>>>> jbd2_journal_restart(), current code is absolute correct. Increasing credits
>>>>>> can't help avoid jbd2 assert crash, but make more easily to trigger crash,
>>>>>> because it makes jbd2_journal_extend() more easily to return "status > 0". In my
>>>>>> view, the correct fix should delete the access/dirty pair and make sure the
>>>>>> ocfs2_split_extent() is journal self-service.
>>>>>>
>>>>>> Different buffer head issue could be handled by rechecking ocfs2_split_extent(),
>>>>>> make sure all journal handle branches are correct. I already checked
>>>>>> ocfs2_split_extent, but I can't assure my eyes didn't miss any corner.
>>>>>
>>>>> ocfs2_split_extent are not just 'read' operations, it may grow the tree,
>>>>> that's why we have to do a transaction here.
>>>>
>>>> You misunderstand my meaning, my mean: from the caller __ocfs2_move_extent()
>>>> calling ocfs2_journal_access_di() (below [a]) to (before) the handling
>>>> ctxt.c_contig_type (below [b.1]) in ocfs2_split_extent, this area is purely read
>>>> operations. It's the range [ [a], [b.1]) (include [a], not include [b.1])
>>>>
>>>> __ocfs2_move_extent
>>>>    + ocfs2_journal_access_di //[a]   <------+
>>>>    + ocfs2_split_extent      //[b]          | [a, b.1) area is read operations
>>>>    |  + if                   //[b.1] <------+
>>>>    |  |  A
>>>>    |  |  +---- from this line, this func starts r/w IOs & needs jbd2 operations
>>>>    |  |
>>>>    |  |   ocfs2_replace_extent_rec()/ocfs2_split_and_insert()
>>>>    |  + else
>>>>    |      ocfs2_try_to_merge_extent ()
>>>>    |
>>>>    + ocfs2_journal_dirty     //[c]
>>>>
>>>>>
>>>>> The whole code flow:
>>>>> ocfs2_ioctl_move_extents
>>>>>     ocfs2_move_extents
>>>>>       __ocfs2_move_extents_range
>>>>>         ocfs2_defrag_extent  [1]
>>>>>           ocfs2_start_trans  [a]
>>>>>           __ocfs2_move_extent
>>>>>             ocfs2_journal_access_di
>>>>>             ocfs2_split_extent
>>>>>             ocfs2_journal_dirty
>>>>>           ocfs2_commit_trans
>>>>>         ocfs2_move_extent  [2]
>>>>>           ocfs2_start_trans
>>>>>           __ocfs2_move_extent
>>>>>             ocfs2_journal_access_di
>>>>>             ocfs2_split_extent
>>>>>             ocfs2_journal_dirty
>>>>>           ocfs2_commit_trans
>>>>>       ocfs2_start_trans  [b]
>>>>>       ocfs2_journal_access_di
>>>>>       ocfs2_journal_dirty
>>>>>       ocfs2_commit_trans
>>>>>
>>>>> In above, [a] and [b] are different transaction start/commit pair, and
>>>>> each allocates different journal credits, even they are operate the same
>>>>> bh, they are still different operations.
>>>>
>>>> I agree above writing, but it can't change my conclusion: ocfs2_split_extent()
>>>> is journal self-service.
>>>
>>> Don't understand what does 'journal self-service' mean.
>>
>> Sorry for my English skill, I invent this word 'journal self-service'.
>> The meaning is: this function could handle journal operations totally by itself.
>> Caller doesn't need to call access/dirty pair. Caller only needs to call jbd2
>> journal start/stop pair.
>>
>> Because ocfs2_start_trans() & ocfs2_commit_trans are reenterable function,
>> we even could add these two func in ocfs2_split_extent(). Then any caller won't
>> take care of any journal operations when calling ocfs2_split_extent().
>>
>>>
>>>> It seems the author meaning: he wanted [a] to protect defragging actions (extents
>>>> changes), wanted [b] to protect inode ctime changes. the
>>>> ocfs2_start_trans/ocfs2_commit_trans in [a] area is necessary. but the
>>>> access/dirty pair in __ocfs2_move_extent() is potential crash risk.
>>>>
>>> If so, what is transaction [a] protect? (no dirty buffer)
>>
>> Transaction [a] protects the dirty buffers in ocfs2_split_extent().
>> ocfs2_split_extent() has already correctly used access/dirty pair to
>> mark/handle dirty buffers.
>>
> Okay, I think I've gotten your idea now. You change looks reasonable.
> Could you please also check if move extents without auto defrag also has
> the same issue?
> 

I have finished code checking. ocfs2_move_extent() path has the same issue.
My patch also works fine from this code branch.
In theory, ocfs2_move_extent() has less opportunity to trigger crash, because
userspace defrag.ocfs2 very likely set less than file size on
'struct ocfs2_move_extents : me_len', if defrag.ocfs2 would support this
code branch.

If you think my patch is acceptable, I will file v2 patch & plan only to change
commit log in v2.

Thanks,
Heming



More information about the Ocfs2-devel mailing list