[Ocfs2-devel] [PATCH 1/2] ocfs2: fix missing reset j_num_trans for sync
Heming Zhao
heming.zhao at suse.com
Mon May 1 02:48:49 UTC 2023
On Mon, May 01, 2023 at 10:07:34AM +0800, Joseph Qi wrote:
> Hi,
>
> What's the journal status in this case?
> I wonder why commit thread is not working, which should flush journal
> and reset j_num_trans during commit cache.
This rootcasue of NO. 266 is that ocfs2_sync_fs bypasses commit thread and
directly calls jbd2 APIs to commit journal. This code design makes 'sync'
doesn't reset ->j_num_trans. So my patch add the kick commit thread action
in sync path.
Thanks,
Heming
>
>
> On 4/30/23 11:13 AM, Heming Zhao wrote:
> > fstest generic cases 266 272 281 trigger hanging issue when umount.
> >
> > I use 266 to describe the root cause.
> >
> > ```
> > 49 _dmerror_unmount
> > 50 _dmerror_mount
> > 51
> > 52 echo "Compare files"
> > 53 md5sum $testdir/file1 | _filter_scratch
> > 54 md5sum $testdir/file2 | _filter_scratch
> > 55
> > 56 echo "CoW and unmount"
> > 57 sync
> > 58 _dmerror_load_error_table
> > 59 urk=$($XFS_IO_PROG -f -c "pwrite -S 0x63 -b $bufsize 0 $filesize" \
> > 60 -c "fdatasync" $testdir/file2 2>&1)
> > 61 echo $urk >> $seqres.full
> > 62 echo "$urk" | grep -q "error" || _fail "pwrite did not fail"
> > 63
> > 64 echo "Clean up the mess"
> > 65 _dmerror_unmount
> > ```
> >
> > After line 49 50 umount & mount ocfs2 dev, this case run md5sum to
> > verify target file. Line 57 run 'sync' before line 58 changes the dm
> > target from dm-linear to dm-error. This case is hanging at line 65.
> >
> > The md5sum calls jbd2 trans pair: ocfs2_[start|commit]_trans to
> > do journal job. But there is only ->j_num_trans+1 in ocfs2_start_trans,
> > the ocfs2_commit_trans doesn't do reduction operation, 'sync' neither.
> > finally no function reset ->j_num_trans until umount is triggered.
> >
> > call flow:
> > ```
> > [md5sum] //line 53 54
> > vfs_read
> > ocfs2_file_read_iter
> > ocfs2_inode_lock_atime
> > ocfs2_update_inode_atime
> > + ocfs2_start_trans //atomic_inc j_num_trans
> > + ...
> > + ocfs2_commit_trans//no modify j_num_trans
> >
> > sync //line 57. no modify j_num_trans
> >
> > _dmerror_load_error_table //all write will return error after this line
> >
> > _dmerror_unmount //found j_num_trans is not zero, run commit thread
> > //but the underlying dev is dm-error, journaling IO
> > //failed all the time and keep going to retry.
> > ```
> >
> > *** How to fix ***
> >
> > kick commit thread in sync path, which can reset j_num_trans to 0.
> >
> > Signed-off-by: Heming Zhao <heming.zhao at suse.com>
> > ---
> > fs/ocfs2/super.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
> > index 0b0e6a132101..bb3fa21e9b47 100644
> > --- a/fs/ocfs2/super.c
> > +++ b/fs/ocfs2/super.c
> > @@ -412,6 +412,9 @@ static int ocfs2_sync_fs(struct super_block *sb, int wait)
> > jbd2_log_wait_commit(osb->journal->j_journal,
> > target);
> > }
> > + /* kick commit thread to reset journal->j_num_trans */
> > + if (atomic_read(&(osb->journal->j_num_trans)))
> > + wake_up(&osb->checkpoint_event);
> > return 0;
> > }
> >
More information about the Ocfs2-devel
mailing list