[Ocfs2-devel] [PATCH 1/2] ocfs2: fix missing reset j_num_trans for sync

Heming Zhao heming.zhao at suse.com
Mon May 1 02:48:49 UTC 2023


On Mon, May 01, 2023 at 10:07:34AM +0800, Joseph Qi wrote:
> Hi,
> 
> What's the journal status in this case?
> I wonder why commit thread is not working, which should flush journal
> and reset j_num_trans during commit cache.

This rootcasue of NO. 266 is that ocfs2_sync_fs bypasses commit thread and
directly calls jbd2 APIs to commit journal. This code design makes 'sync'
doesn't reset ->j_num_trans. So my patch add the kick commit thread action
in sync path.

Thanks,
Heming

> 
> 
> On 4/30/23 11:13 AM, Heming Zhao wrote:
> > fstest generic cases 266 272 281 trigger hanging issue when umount.
> > 
> > I use 266 to describe the root cause.
> > 
> > ```
> >  49 _dmerror_unmount
> >  50 _dmerror_mount
> >  51
> >  52 echo "Compare files"
> >  53 md5sum $testdir/file1 | _filter_scratch
> >  54 md5sum $testdir/file2 | _filter_scratch
> >  55
> >  56 echo "CoW and unmount"
> >  57 sync
> >  58 _dmerror_load_error_table
> >  59 urk=$($XFS_IO_PROG -f -c "pwrite -S 0x63 -b $bufsize 0 $filesize" \
> >  60     -c "fdatasync" $testdir/file2 2>&1)
> >  61 echo $urk >> $seqres.full
> >  62 echo "$urk" | grep -q "error" || _fail "pwrite did not fail"
> >  63
> >  64 echo "Clean up the mess"
> >  65 _dmerror_unmount
> > ```
> > 
> > After line 49 50 umount & mount ocfs2 dev, this case run md5sum to
> > verify target file. Line 57 run 'sync' before line 58 changes the dm
> > target from dm-linear to dm-error. This case is hanging at line 65.
> > 
> > The md5sum calls jbd2 trans pair: ocfs2_[start|commit]_trans to
> > do journal job. But there is only ->j_num_trans+1 in ocfs2_start_trans,
> > the ocfs2_commit_trans doesn't do reduction operation, 'sync' neither.
> > finally no function reset ->j_num_trans until umount is triggered.
> > 
> > call flow:
> > ```
> > [md5sum] //line 53 54
> >  vfs_read
> >   ocfs2_file_read_iter
> >    ocfs2_inode_lock_atime
> >     ocfs2_update_inode_atime
> >      + ocfs2_start_trans //atomic_inc j_num_trans
> >      + ...
> >      + ocfs2_commit_trans//no modify j_num_trans
> > 
> > sync //line 57. no modify j_num_trans
> > 
> > _dmerror_load_error_table //all write will return error after this line
> > 
> > _dmerror_unmount //found j_num_trans is not zero, run commit thread
> >                //but the underlying dev is dm-error, journaling IO
> >                //failed all the time and keep going to retry.
> > ```
> > 
> > *** How to fix ***
> > 
> > kick commit thread in sync path, which can reset j_num_trans to 0.
> > 
> > Signed-off-by: Heming Zhao <heming.zhao at suse.com>
> > ---
> >  fs/ocfs2/super.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
> > index 0b0e6a132101..bb3fa21e9b47 100644
> > --- a/fs/ocfs2/super.c
> > +++ b/fs/ocfs2/super.c
> > @@ -412,6 +412,9 @@ static int ocfs2_sync_fs(struct super_block *sb, int wait)
> >  			jbd2_log_wait_commit(osb->journal->j_journal,
> >  					     target);
> >  	}
> > +	/* kick commit thread to reset journal->j_num_trans */
> > +	if (atomic_read(&(osb->journal->j_num_trans)))
> > +		wake_up(&osb->checkpoint_event);
> >  	return 0;
> >  }
> >  



More information about the Ocfs2-devel mailing list