[Ocfs2-devel] [PATCH 1/2] ocfs2: fix missing reset j_num_trans for sync

Joseph Qi joseph.qi at linux.alibaba.com
Mon May 1 02:07:34 UTC 2023


Hi,

What's the journal status in this case?
I wonder why commit thread is not working, which should flush journal
and reset j_num_trans during commit cache.

Thanks,
Joseph

On 4/30/23 11:13 AM, Heming Zhao wrote:
> fstest generic cases 266 272 281 trigger hanging issue when umount.
> 
> I use 266 to describe the root cause.
> 
> ```
>  49 _dmerror_unmount
>  50 _dmerror_mount
>  51
>  52 echo "Compare files"
>  53 md5sum $testdir/file1 | _filter_scratch
>  54 md5sum $testdir/file2 | _filter_scratch
>  55
>  56 echo "CoW and unmount"
>  57 sync
>  58 _dmerror_load_error_table
>  59 urk=$($XFS_IO_PROG -f -c "pwrite -S 0x63 -b $bufsize 0 $filesize" \
>  60     -c "fdatasync" $testdir/file2 2>&1)
>  61 echo $urk >> $seqres.full
>  62 echo "$urk" | grep -q "error" || _fail "pwrite did not fail"
>  63
>  64 echo "Clean up the mess"
>  65 _dmerror_unmount
> ```
> 
> After line 49 50 umount & mount ocfs2 dev, this case run md5sum to
> verify target file. Line 57 run 'sync' before line 58 changes the dm
> target from dm-linear to dm-error. This case is hanging at line 65.
> 
> The md5sum calls jbd2 trans pair: ocfs2_[start|commit]_trans to
> do journal job. But there is only ->j_num_trans+1 in ocfs2_start_trans,
> the ocfs2_commit_trans doesn't do reduction operation, 'sync' neither.
> finally no function reset ->j_num_trans until umount is triggered.
> 
> call flow:
> ```
> [md5sum] //line 53 54
>  vfs_read
>   ocfs2_file_read_iter
>    ocfs2_inode_lock_atime
>     ocfs2_update_inode_atime
>      + ocfs2_start_trans //atomic_inc j_num_trans
>      + ...
>      + ocfs2_commit_trans//no modify j_num_trans
> 
> sync //line 57. no modify j_num_trans
> 
> _dmerror_load_error_table //all write will return error after this line
> 
> _dmerror_unmount //found j_num_trans is not zero, run commit thread
>                //but the underlying dev is dm-error, journaling IO
>                //failed all the time and keep going to retry.
> ```
> 
> *** How to fix ***
> 
> kick commit thread in sync path, which can reset j_num_trans to 0.
> 
> Signed-off-by: Heming Zhao <heming.zhao at suse.com>
> ---
>  fs/ocfs2/super.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
> index 0b0e6a132101..bb3fa21e9b47 100644
> --- a/fs/ocfs2/super.c
> +++ b/fs/ocfs2/super.c
> @@ -412,6 +412,9 @@ static int ocfs2_sync_fs(struct super_block *sb, int wait)
>  			jbd2_log_wait_commit(osb->journal->j_journal,
>  					     target);
>  	}
> +	/* kick commit thread to reset journal->j_num_trans */
> +	if (atomic_read(&(osb->journal->j_num_trans)))
> +		wake_up(&osb->checkpoint_event);
>  	return 0;
>  }
>  



More information about the Ocfs2-devel mailing list