[Ocfs2-devel] discuss about jbd2 assertion in defragment path
Joseph Qi
joseph.qi at linux.alibaba.com
Wed Feb 15 12:04:48 UTC 2023
On 2/15/23 6:02 PM, Heming Zhao wrote:
> On 2/15/23 2:42 PM, Joseph Qi wrote:
>>
>>
>> On 2/15/23 2:20 PM, Heming Zhao wrote:
>>> On 2/15/23 10:06 AM, Joseph Qi wrote:
>>>>
>>>>
>>>> On 2/14/23 7:48 PM, Heming Zhao wrote:
>>>>> On Tue, Feb 14, 2023 at 07:08:16PM +0800, Joseph Qi wrote:
>>>>>> Hi,
>>>>>>
>>>>>> On 2/14/23 12:33 PM, Heming Zhao wrote:
>>>>>>> On Tue, Feb 14, 2023 at 10:52:30AM +0800, Joseph Qi wrote:
>>>>>>>> Hi, Sorry about the late reply.
>>>>>>>> This thread is indeed a long time ago:(
>>>>>>>> It seems that I said the two ocfs2_journal_access_di() are for different
>>>>>>>> buffer head. Anyway, I have to recall the discussion before and get back
>>>>>>>> to you.
>>>>>>>>
>>>>>>> If you belive from [a] to [b.1] doesn't need any journal protection, my patch
>>>>>>> makes sense from theory.
>>>>>>>
>>>>>>> From code logic, if the defrag path (ocfs2_split_extent) doesn't call
>>>>>>> jbd2_journal_restart(), current code is absolute correct. Increasing credits
>>>>>>> can't help avoid jbd2 assert crash, but make more easily to trigger crash,
>>>>>>> because it makes jbd2_journal_extend() more easily to return "status > 0". In my
>>>>>>> view, the correct fix should delete the access/dirty pair and make sure the
>>>>>>> ocfs2_split_extent() is journal self-service.
>>>>>>>
>>>>>>> Different buffer head issue could be handled by rechecking ocfs2_split_extent(),
>>>>>>> make sure all journal handle branches are correct. I already checked
>>>>>>> ocfs2_split_extent, but I can't assure my eyes didn't miss any corner.
>>>>>>
>>>>>> ocfs2_split_extent are not just 'read' operations, it may grow the tree,
>>>>>> that's why we have to do a transaction here.
>>>>>
>>>>> You misunderstand my meaning, my mean: from the caller __ocfs2_move_extent()
>>>>> calling ocfs2_journal_access_di() (below [a]) to (before) the handling
>>>>> ctxt.c_contig_type (below [b.1]) in ocfs2_split_extent, this area is purely read
>>>>> operations. It's the range [ [a], [b.1]) (include [a], not include [b.1])
>>>>>
>>>>> __ocfs2_move_extent
>>>>> + ocfs2_journal_access_di //[a] <------+
>>>>> + ocfs2_split_extent //[b] | [a, b.1) area is read operations
>>>>> | + if //[b.1] <------+
>>>>> | | A
>>>>> | | +---- from this line, this func starts r/w IOs & needs jbd2 operations
>>>>> | |
>>>>> | | ocfs2_replace_extent_rec()/ocfs2_split_and_insert()
>>>>> | + else
>>>>> | ocfs2_try_to_merge_extent ()
>>>>> |
>>>>> + ocfs2_journal_dirty //[c]
>>>>>
>>>>>>
>>>>>> The whole code flow:
>>>>>> ocfs2_ioctl_move_extents
>>>>>> ocfs2_move_extents
>>>>>> __ocfs2_move_extents_range
>>>>>> ocfs2_defrag_extent [1]
>>>>>> ocfs2_start_trans [a]
>>>>>> __ocfs2_move_extent
>>>>>> ocfs2_journal_access_di
>>>>>> ocfs2_split_extent
>>>>>> ocfs2_journal_dirty
>>>>>> ocfs2_commit_trans
>>>>>> ocfs2_move_extent [2]
>>>>>> ocfs2_start_trans
>>>>>> __ocfs2_move_extent
>>>>>> ocfs2_journal_access_di
>>>>>> ocfs2_split_extent
>>>>>> ocfs2_journal_dirty
>>>>>> ocfs2_commit_trans
>>>>>> ocfs2_start_trans [b]
>>>>>> ocfs2_journal_access_di
>>>>>> ocfs2_journal_dirty
>>>>>> ocfs2_commit_trans
>>>>>>
>>>>>> In above, [a] and [b] are different transaction start/commit pair, and
>>>>>> each allocates different journal credits, even they are operate the same
>>>>>> bh, they are still different operations.
>>>>>
>>>>> I agree above writing, but it can't change my conclusion: ocfs2_split_extent()
>>>>> is journal self-service.
>>>>
>>>> Don't understand what does 'journal self-service' mean.
>>>
>>> Sorry for my English skill, I invent this word 'journal self-service'.
>>> The meaning is: this function could handle journal operations totally by itself.
>>> Caller doesn't need to call access/dirty pair. Caller only needs to call jbd2
>>> journal start/stop pair.
>>>
>>> Because ocfs2_start_trans() & ocfs2_commit_trans are reenterable function,
>>> we even could add these two func in ocfs2_split_extent(). Then any caller won't
>>> take care of any journal operations when calling ocfs2_split_extent().
>>>
>>>>
>>>>> It seems the author meaning: he wanted [a] to protect defragging actions (extents
>>>>> changes), wanted [b] to protect inode ctime changes. the
>>>>> ocfs2_start_trans/ocfs2_commit_trans in [a] area is necessary. but the
>>>>> access/dirty pair in __ocfs2_move_extent() is potential crash risk.
>>>>>
>>>> If so, what is transaction [a] protect? (no dirty buffer)
>>>
>>> Transaction [a] protects the dirty buffers in ocfs2_split_extent().
>>> ocfs2_split_extent() has already correctly used access/dirty pair to
>>> mark/handle dirty buffers.
>>>
>> Okay, I think I've gotten your idea now. You change looks reasonable.
>> Could you please also check if move extents without auto defrag also has
>> the same issue?
>>
>
> I have finished code checking. ocfs2_move_extent() path has the same issue.
> My patch also works fine from this code branch.
> In theory, ocfs2_move_extent() has less opportunity to trigger crash, because
> userspace defrag.ocfs2 very likely set less than file size on
> 'struct ocfs2_move_extents : me_len', if defrag.ocfs2 would support this
> code branch.
>
If we write a simple program to do move extents ioctl on the same ocfs2
(without auto defrag), can we trigger this crash?
> If you think my patch is acceptable, I will file v2 patch & plan only to change
> commit log in v2.
>
Sure, with above confirmed.
More information about the Ocfs2-devel
mailing list