[Ocfs2-devel] [PATCH 1/2] ocfs2: fix missing reset j_num_trans for sync
Joseph Qi
joseph.qi at linux.alibaba.com
Mon May 1 02:07:34 UTC 2023
Hi,
What's the journal status in this case?
I wonder why commit thread is not working, which should flush journal
and reset j_num_trans during commit cache.
Thanks,
Joseph
On 4/30/23 11:13 AM, Heming Zhao wrote:
> fstest generic cases 266 272 281 trigger hanging issue when umount.
>
> I use 266 to describe the root cause.
>
> ```
> 49 _dmerror_unmount
> 50 _dmerror_mount
> 51
> 52 echo "Compare files"
> 53 md5sum $testdir/file1 | _filter_scratch
> 54 md5sum $testdir/file2 | _filter_scratch
> 55
> 56 echo "CoW and unmount"
> 57 sync
> 58 _dmerror_load_error_table
> 59 urk=$($XFS_IO_PROG -f -c "pwrite -S 0x63 -b $bufsize 0 $filesize" \
> 60 -c "fdatasync" $testdir/file2 2>&1)
> 61 echo $urk >> $seqres.full
> 62 echo "$urk" | grep -q "error" || _fail "pwrite did not fail"
> 63
> 64 echo "Clean up the mess"
> 65 _dmerror_unmount
> ```
>
> After line 49 50 umount & mount ocfs2 dev, this case run md5sum to
> verify target file. Line 57 run 'sync' before line 58 changes the dm
> target from dm-linear to dm-error. This case is hanging at line 65.
>
> The md5sum calls jbd2 trans pair: ocfs2_[start|commit]_trans to
> do journal job. But there is only ->j_num_trans+1 in ocfs2_start_trans,
> the ocfs2_commit_trans doesn't do reduction operation, 'sync' neither.
> finally no function reset ->j_num_trans until umount is triggered.
>
> call flow:
> ```
> [md5sum] //line 53 54
> vfs_read
> ocfs2_file_read_iter
> ocfs2_inode_lock_atime
> ocfs2_update_inode_atime
> + ocfs2_start_trans //atomic_inc j_num_trans
> + ...
> + ocfs2_commit_trans//no modify j_num_trans
>
> sync //line 57. no modify j_num_trans
>
> _dmerror_load_error_table //all write will return error after this line
>
> _dmerror_unmount //found j_num_trans is not zero, run commit thread
> //but the underlying dev is dm-error, journaling IO
> //failed all the time and keep going to retry.
> ```
>
> *** How to fix ***
>
> kick commit thread in sync path, which can reset j_num_trans to 0.
>
> Signed-off-by: Heming Zhao <heming.zhao at suse.com>
> ---
> fs/ocfs2/super.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
> index 0b0e6a132101..bb3fa21e9b47 100644
> --- a/fs/ocfs2/super.c
> +++ b/fs/ocfs2/super.c
> @@ -412,6 +412,9 @@ static int ocfs2_sync_fs(struct super_block *sb, int wait)
> jbd2_log_wait_commit(osb->journal->j_journal,
> target);
> }
> + /* kick commit thread to reset journal->j_num_trans */
> + if (atomic_read(&(osb->journal->j_num_trans)))
> + wake_up(&osb->checkpoint_event);
> return 0;
> }
>
More information about the Ocfs2-devel
mailing list