[Ocfs2-devel] [PATCH] ocfs2: fix missing reset j_num_trans for sync
Heming Zhao
heming.zhao at suse.com
Fri Apr 21 08:36:53 UTC 2023
fstest generic case 266 272 281 trigger hanging issue when umount.
I use 266 to describe the root cause.
```
49 _dmerror_unmount
50 _dmerror_mount
51
52 echo "Compare files"
53 md5sum $testdir/file1 | _filter_scratch
54 md5sum $testdir/file2 | _filter_scratch
55
56 echo "CoW and unmount"
57 sync
58 _dmerror_load_error_table
59 urk=$($XFS_IO_PROG -f -c "pwrite -S 0x63 -b $bufsize 0 $filesize" \
60 -c "fdatasync" $testdir/file2 2>&1)
61 echo $urk >> $seqres.full
62 echo "$urk" | grep -q "error" || _fail "pwrite did not fail"
63
64 echo "Clean up the mess"
65 _dmerror_unmount
```
After line 49 50 umount & mount ocfs2 dev, this case run md5sum to
verify target file. line 57 run 'sync' before line 58 changes the dm
target from dm-linear to dm-error. this case is hanging at line 65.
In 266, md5sum calls jbd2 trans pair: ocfs2_[start|commit]_trans to
do journal job. but there is only ->j_num_trans+1 in ocfs2_start_trans,
the ocfs2_commit_trans doesn't do reduction operation.
call flow:
```
[md5sum]
vfs_read
ocfs2_file_read_iter
ocfs2_inode_lock_atime
ocfs2_update_inode_atime
+ ocfs2_start_trans //atomic_inc j_num_trans
+ ...
+ ocfs2_commit_trans//no modify j_num_trans
sync //no modify j_num_trans
_dmerror_load_error_table //all write will return error after this line
_dmerror_unmount //found j_num_trans is not zero, run commit thread
//but the underlying dev is dm-error, journaling IO
//failed all the time and keep going to retry.
```
*** How to fix ***
kick commit thread in sync path, which can reset j_num_trans to 0.
Signed-off-by: Heming Zhao <heming.zhao at suse.com>
---
fs/ocfs2/super.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c
index 0b0e6a132101..9929320bd967 100644
--- a/fs/ocfs2/super.c
+++ b/fs/ocfs2/super.c
@@ -408,6 +408,9 @@ static int ocfs2_sync_fs(struct super_block *sb, int wait)
if (jbd2_journal_start_commit(osb->journal->j_journal,
&target)) {
+ /* kick commit thread to reset journal->j_num_trans */
+ wake_up(&osb->checkpoint_event);
+
if (wait)
jbd2_log_wait_commit(osb->journal->j_journal,
target);
--
2.39.2
More information about the Ocfs2-devel
mailing list