[Ocfs2-devel] [PATCH] ocfs2: checkpoint appending truncate log transaction before flushing
Changwei Ge
ge.changwei at h3c.com
Fri Feb 15 00:27:27 PST 2019
Hi Jun,
Do you have any other question, advise or concern?
I am expecting an explicit feedback(ack/nack) if you already understand the problem and my way fixing it.
Thanks,
Changwei
On 2019/2/14 18:25, Changwei Ge wrote:
> On 2019/2/14 18:06, piaojun wrote:
>> Hi Changwei,
>>
>> On 2019/2/14 16:53, Changwei Ge wrote:
>>> Hi Jun,
>>>
>>> Thanks for looking into this :-)
>>>
>>> On 2019/2/14 16:24, piaojun wrote:
>>>> Hi Changwei,
>>>>
>>>> On 2019/2/14 12:03, Changwei Ge wrote:
>>>>> Appending truncate log(TA) and and flushing truncate log(TF) are
>>>>> two separated transactions. They can be both committed but not
>>>>> checkpointed. If crash occurs then, both two transaction will be
>>>>> replayed with several already released to global bitmap clusters.
>>>>
>>>> Do you mean that both the two transactions will release cluster to
>>>> global bitmap? But I think the TA won't give back clusters to global
>>>> bitmap.
>>>>
>>>
>>> No, I don't mean that both TA and TF are releasing clusters to global bitmap.
>>>
>>> But consideration into clusters reclaim , clusters will first be recorded in truncate
>>> log and then be returned to global bitmap, which involves TA and TF jdb2/transactions.
>>>
>>> TA's job is to append cluster records to truncate log, by which we can overcome a potential space leak problem.
>>> TF's job is to return clusters to global bitmap.
>>>
>>> It's possible that TA and TF are both committed to JBD but sadly none of them is check-pointed.
>>> So journal replaying need to replay both TA and TF during next mount.
>>> Then there is a record residing in truncate log representing the already released cluster
>>> which has been returned to global bitmap by replaying TF.
>>>
>>> Now the double free shows up.
>>
>> Do you mean that when mount again, truncate log recovery will find
>> record residing in truncate log which already released? But after the
>> TF transaction replayed during mount, truncate log won't be recovered
>> as tl->tl_used is less than tl->tl_count.
>
> Um, not just truncate log relaying but also involves a jbd2 transaction recording its last append operation.
> That operation may meet the flush condition (ocfs2_truncate_log_needs_flush)
>
> Thanks,
> Changwei
>
>>
>> Thanks,
>> Jun
>>
>>>
>>>
>>>>> Then truncate log will be replayed resulting in cluster double free.
>>>>
>>>> Does this problem only cause some error log? As below:
>>>>
>>>> ocfs2_replay_truncate_records
>>>> ocfs2_free_clusters
>>>> _ocfs2_free_clusters
>>>> _ocfs2_free_suballoc_bits
>>>> ocfs2_block_group_clear_bits
>>>> "Trying to clear %u bits at offset %u in group descriptor"
>>>>
>>>
>>> Exactly, when the issue occurs, it will be printed as above.
>>>
>>> Thanks,
>>> Changwei
>>>
>>>> Thanks,
>>>> Jun
>>>>
>>>>>
>>>>> To reproduce this issue, just crash the host while punching hole to files.
>>>>>
>>>>> Signed-off-by: Changwei Ge <ge.changwei at h3c.com>
>>>>> ---
>>>>> fs/ocfs2/alloc.c | 15 +++++++++++++++
>>>>> 1 file changed, 15 insertions(+)
>>>>>
>>>>> diff --git a/fs/ocfs2/alloc.c b/fs/ocfs2/alloc.c
>>>>> index d1cbb27..29bc777 100644
>>>>> --- a/fs/ocfs2/alloc.c
>>>>> +++ b/fs/ocfs2/alloc.c
>>>>> @@ -6007,6 +6007,7 @@ int __ocfs2_flush_truncate_log(struct ocfs2_super *osb)
>>>>> struct buffer_head *data_alloc_bh = NULL;
>>>>> struct ocfs2_dinode *di;
>>>>> struct ocfs2_truncate_log *tl;
>>>>> + struct ocfs2_journal *journal = osb->journal;
>>>>>
>>>>> BUG_ON(inode_trylock(tl_inode));
>>>>>
>>>>> @@ -6027,6 +6028,20 @@ int __ocfs2_flush_truncate_log(struct ocfs2_super *osb)
>>>>> goto out;
>>>>> }
>>>>>
>>>>> + /* Appending truncate log(TA) and and flushing truncate log(TF) are
>>>>> + * two separated transactions. They can be both committed but not
>>>>> + * checkpointed. If crash occurs then, both two transaction will be
>>>>> + * replayed with several already released to global bitmap clusters.
>>>>> + * Then truncate log will be replayed resulting in cluster double free.
>>>>> + */
>>>>> + jbd2_journal_lock_updates(journal->j_journal);
>>>>> + status = jbd2_journal_flush(journal->j_journal);
>>>>> + jbd2_journal_unlock_updates(journal->j_journal);
>>>>> + if (status < 0) {
>>>>> + mlog_errno(status);
>>>>> + goto out;
>>>>> + }
>>>>> +
>>>>> data_alloc_inode = ocfs2_get_system_file_inode(osb,
>>>>> GLOBAL_BITMAP_SYSTEM_INODE,
>>>>> OCFS2_INVALID_SLOT);
>>>>>
>>>>
>>> .
>>>
>>
>
> _______________________________________________
> Ocfs2-devel mailing list
> Ocfs2-devel at oss.oracle.com
> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>
More information about the Ocfs2-devel
mailing list