[Ocfs2-devel] [PATCH v1 1/5] ocfs2: partly revert da5e7c87827e8 for mounting crash issue

Joseph Qi joseph.qi at linux.alibaba.com
Sun Apr 10 12:00:31 UTC 2022



On 4/10/22 12:28 AM, heming.zhao at suse.com wrote:
> On 4/9/22 21:11, Joseph Qi wrote:
>> Suggest rename the subject, something like
>> ocfs2: fix mount crash if journal is not alloced
> 
> OK.
> 
>>
>> On 4/8/22 6:30 PM, Heming Zhao wrote:
>>> After commit da5e7c87827e8 ("ocfs2: cleanup journal init and shutdown"),
>>> journal init later than before, it makes NULL pointer access in free
>>> routine.
>>>
>>> Crash flow:
>>>
>>> ocfs2_fill_super
>>>   + ocfs2_mount_volume
>>>   |  + ocfs2_dlm_init //fail & return, osb->journal is NULL.
>>>   |  + ...
>>>   |  + ocfs2_check_volume //no chance to init osb->journal
>>>   |
>>>   + ...
>>>   + ocfs2_dismount_volume
>>>      ocfs2_release_system_inodes
>>>        ...
>>>         evict
>>>          ...
>>>           ocfs2_clear_inode
>>>            ocfs2_checkpoint_inode
>>>             ocfs2_ci_fully_checkpointed
>>>              time_after(journal->j_trans_id, ci->ci_last_trans)
>>>               + journal is empty, crash!
>>>
>>> For fixing, there are three solutions:
>>>
>>> 1> Revert commit da5e7c87827e8
>>>
>>> For avoiding kernel crash, this make sense for us. We only concerned
>>> whether there has any non-system inode access before dlm init. The
>>> answer is NO. And all journal replay/recovery handling happen after
>>> dlm & journal init done. So this method is not graceful but workable.
>>>
>>> 2> Add osb->journal check in free inode routine (eg ocfs2_clear_inode)
>>>
>>> The fix code is special for mounting phase, but it will continue
>>> working after mounting stage. In another word, this method adds useless
>>> code in normal inode free flow.
>>>
>>> 3> Do directly free inode in mounting phase
>>>
>>> This method is brutal/complex and may introduce unsafe code, currently
>>> maintainer didn't like.
>>>
>>> At last, we chose method <1> and partly reverted commit da5e7c87827e8.
>>> We reverted journal init code, and kept cleanup code flow.
>>>
>>> Fixes: da5e7c87827e8 ("ocfs2: cleanup journal init and shutdown")
>>> Signed-off-by: Heming Zhao <heming.zhao at suse.com>
>>> ---
>>>   fs/ocfs2/inode.c   |  4 ++--
>>>   fs/ocfs2/journal.c | 21 +--------------------
>>>   fs/ocfs2/super.c   | 33 +++++++++++++++++++++++++++++++++
>>>   3 files changed, 36 insertions(+), 22 deletions(-)
>>>
>>> diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
>>> index 5739dc301569..bb116c39b581 100644
>>> --- a/fs/ocfs2/inode.c
>>> +++ b/fs/ocfs2/inode.c
>>> @@ -125,6 +125,7 @@ struct inode *ocfs2_iget(struct ocfs2_super *osb, u64 blkno, unsigned flags,
>>>       struct inode *inode = NULL;
>>>       struct super_block *sb = osb->sb;
>>>       struct ocfs2_find_inode_args args;
>>> +    journal_t *journal = osb->journal->j_journal;
>>>         trace_ocfs2_iget_begin((unsigned long long)blkno, flags,
>>>                      sysfile_type);
>>> @@ -171,11 +172,10 @@ struct inode *ocfs2_iget(struct ocfs2_super *osb, u64 blkno, unsigned flags,
>>>        * part of the transaction - the inode could have been reclaimed and
>>>        * now it is reread from disk.
>>>        */
>>> -    if (osb->journal) {
>>> +    if (journal) {
>>>           transaction_t *transaction;
>>>           tid_t tid;
>>>           struct ocfs2_inode_info *oi = OCFS2_I(inode);
>>> -        journal_t *journal = osb->journal->j_journal;
>>>             read_lock(&journal->j_state_lock);
>>>           if (journal->j_running_transaction)
>>> diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c
>>> index 1887a2708709..afb85de3bb60 100644
>>> --- a/fs/ocfs2/journal.c
>>> +++ b/fs/ocfs2/journal.c
>>> @@ -815,30 +815,11 @@ int ocfs2_journal_init(struct ocfs2_super *osb, int *dirty)
>>>       int status = -1;
>>>       struct inode *inode = NULL; /* the journal inode */
>>>       journal_t *j_journal = NULL;
>>> -    struct ocfs2_journal *journal = NULL;
>>> +    struct ocfs2_journal *journal = osb->journal;
>>>       struct ocfs2_dinode *di = NULL;
>>>       struct buffer_head *bh = NULL;
>>>       int inode_lock = 0;
>>>   -    /* initialize our journal structure */
>>> -    journal = kzalloc(sizeof(struct ocfs2_journal), GFP_KERNEL);
>>> -    if (!journal) {
>>> -        mlog(ML_ERROR, "unable to alloc journal\n");
>>> -        status = -ENOMEM;
>>> -        goto done;
>>> -    }
>>> -    osb->journal = journal;
>>> -    journal->j_osb = osb;
>>> -
>>> -    atomic_set(&journal->j_num_trans, 0);
>>> -    init_rwsem(&journal->j_trans_barrier);
>>> -    init_waitqueue_head(&journal->j_checkpointed);
>>> -    spin_lock_init(&journal->j_lock);
>>> -    journal->j_trans_id = 1UL;
>>> -    INIT_LIST_HEAD(&journal->j_la_cleanups);
>>> -    INIT_WORK(&journal->j_recovery_work, ocfs2_complete_recovery);
>>> -    journal->j_state = OCFS2_JOURNAL_FREE;
>>> -
>>>       /* already have the inode for our journal */
>>>       inode = ocfs2_get_system_file_inode(osb, JOURNAL_SYSTEM_INODE,
>>>                           osb->slot_num);
>>> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
>>> index 477cdf94122e..5f63a2333e52 100644
>>> --- a/fs/ocfs2/super.c
>>> +++ b/fs/ocfs2/super.c
>>> @@ -2015,6 +2015,7 @@ static int ocfs2_initialize_super(struct super_block *sb,
>>>       int i, cbits, bbits;
>>>       struct ocfs2_dinode *di = (struct ocfs2_dinode *)bh->b_data;
>>>       struct inode *inode = NULL;
>>> +    struct ocfs2_journal *journal;
>>>       struct ocfs2_super *osb;
>>>       u64 total_blocks;
>>>   @@ -2195,6 +2196,32 @@ static int ocfs2_initialize_super(struct super_block *sb,
>>>         get_random_bytes(&osb->s_next_generation, sizeof(u32));
>>>   +    /* FIXME
>>> +     * This should be done in ocfs2_journal_init(), but unknown
>>> +     * ordering issues will cause the filesystem to crash.
>>> +     * If anyone wants to figure out what part of the code
>>> +     * refers to osb->journal before ocfs2_journal_init() is run,
>>> +     * be my guest.
>>> +     */
>>> +    /* initialize our journal structure */
>>> +    journal = kzalloc(sizeof(struct ocfs2_journal), GFP_KERNEL);
>>> +    if (!journal) {
>>> +        mlog(ML_ERROR, "unable to alloc journal\n");
>>> +        status = -ENOMEM;
>>> +        goto bail;
>>> +    }
>>> +    osb->journal = journal;
>>> +    journal->j_osb = osb;
>>> +
>>> +    atomic_set(&journal->j_num_trans, 0);
>>> +    init_rwsem(&journal->j_trans_barrier);
>>> +    init_waitqueue_head(&journal->j_checkpointed);
>>> +    spin_lock_init(&journal->j_lock);
>>> +    journal->j_trans_id = 1UL;
>>> +    INIT_LIST_HEAD(&journal->j_la_cleanups);
>>> +    INIT_WORK(&journal->j_recovery_work, ocfs2_complete_recovery);
>>> +    journal->j_state = OCFS2_JOURNAL_FREE;
>>> +
>> We may fold these into a new function, such as ocfs2_journal_alloc().
>>
> 
> OK, will create a new function in journal.c
> For this area FIXME comment, the legacy comment humor & vague. In my view, we
> already got the order issue clearly, why not we change it as below?
>     /*
>      * FIXME
>      * This should be done in ocfs2_journal_init(), but any inode
>      * writes back operation will cause the filesystem to crash.
>      */

Sounds good.

Thanks,
Joseph



More information about the Ocfs2-devel mailing list