[Ocfs2-devel] [PATCH 1/2] ocfs2: fix missing reset j_num_trans for sync

Heming Zhao heming.zhao at suse.com
Tue May 2 03:43:08 UTC 2023


On Tue, May 02, 2023 at 08:29:36AM +0800, Joseph Qi wrote:
> 
> 
> On 5/1/23 8:35 PM, Heming Zhao wrote:
> > On Mon, May 01, 2023 at 07:44:42PM +0800, Joseph Qi wrote:
> >>
> >>
> >> On 5/1/23 12:51 PM, Heming Zhao wrote:
> >>> On Mon, May 01, 2023 at 11:21:15AM +0800, Joseph Qi wrote:
> >>>>
> >>>>
> >>>> On 5/1/23 11:09 AM, Heming Zhao wrote:
> >>>>> On Mon, May 01, 2023 at 10:52:13AM +0800, Joseph Qi wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 5/1/23 10:26 AM, Heming Zhao wrote:
> >>>>>>> On Mon, May 01, 2023 at 10:07:34AM +0800, Joseph Qi wrote:
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> What's the journal status in this case?
> >>>>>>> jbd2 is marked with JBD2_ABORT
> >>>>>>>
> >>>>>>>> I wonder why commit thread is not working, which should flush journal
> >>>>>>>> and reset j_num_trans during commit cache.
> >>>>>>> The reason is in my above answer.
> >>>>>>>
> >>>>>>> below is the dmesg log for No. 266.
> >>>>>>>
> >>>>>>> ```
> >>>>>>> [  595.071807] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: joining the lockspace group...
> >>>>>>> [  595.080950] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
> >>>>>>> [  595.081091] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1
> >>>>>>> [  595.081580] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: add member 1
> >>>>>>> [  595.081886] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_members 1 nodes
> >>>>>>> [  595.082173] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: generation 1 slots 1 1:1
> >>>>>>> [  595.082420] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory
> >>>>>>> [  595.082696] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 in 0 new
> >>>>>>> [  595.082932] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 out 0 messages
> >>>>>>> [  595.083188] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 generation 1 done: 0 ms
> >>>>>>> [  595.084721] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: join complete
> >>>>>>> [  595.095436] kernel: ocfs2: Mounting device (253,16) on (node 1, slot 0) with ordered data mode.
> >>>>>>> [  595.787078] systemd[1]: Started /usr/bin/bash -c exit 77.
> >>>>>>> [  595.802650] systemd[1]: fstests-check.scope: Deactivated successfully.
> >>>>>>> [  596.195734] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: joining the lockspace group...
> >>>>>>> [  596.206310] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: group event done 0
> >>>>>>> [  596.206350] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover 1
> >>>>>>> [  596.209667] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: add member 1
> >>>>>>> [  596.211433] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_members 1 nodes
> >>>>>>> [  596.213942] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: generation 1 slots 1 1:1
> >>>>>>> [  596.215668] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory
> >>>>>>> [  596.217171] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory 0 in 0 new
> >>>>>>> [  596.218253] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover_directory 0 out 0 messages
> >>>>>>> [  596.219401] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: dlm_recover 1 generation 1 done: 0 ms
> >>>>>>> [  596.220628] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: join complete
> >>>>>>> [  596.224616] kernel: JBD2: Ignoring recovery information on journal
> >>>>>>> [  596.229686] kernel: ocfs2: Mounting device (253,32) on (node 1, slot 0) with ordered data mode.
> >>>>>>> [  596.870008] systemd[1]: fstest-scratch.mount: Deactivated successfully.
> >>>>>>> [  596.236135] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: leaving the lockspace group...
> >>>>>>> [  596.237431] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: group event done 0
> >>>>>>> [  596.239442] kernel: dlm: 78CE18109AE44E6AA896722A5E25CFC0: release_lockspace final free
> >>>>>>> [  596.877286] ocfs2_hb_ctl[5436]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 78CE18109AE44E6AA896722A5E25CFC0
> >>>>>>> [  596.248299] kernel: ocfs2: Unmounting device (253,32) on (node 1)
> >>>>>>> [  596.255888] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: leaving the lockspace group...
> >>>>>>> [  596.905718] systemd[1]: fstest-test.mount: Deactivated successfully.
> >>>>>>> [  596.280602] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
> >>>>>>> [  596.281762] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: release_lockspace final free
> >>>>>>> [  596.921672] ocfs2_hb_ctl[5448]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u 838DD74847E34AA1839D6C4AA31132F1
> >>>>>>> [  596.299507] kernel: ocfs2: Unmounting device (253,16) on (node 1)
> >>>>>>> [  596.481755] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: joining the lockspace group...
> >>>>>>> [  596.484722] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1
> >>>>>>> [  596.484729] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: group event done 0
> >>>>>>> [  596.485361] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: add member 1
> >>>>>>> [  596.485622] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_members 1 nodes
> >>>>>>> [  596.485861] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: generation 1 slots 1 1:1
> >>>>>>> [  596.486063] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory
> >>>>>>> [  596.486293] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 in 0 new
> >>>>>>> [  596.486498] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover_directory 0 out 0 messages
> >>>>>>> [  596.486768] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: dlm_recover 1 generation 1 done: 0 ms
> >>>>>>> [  596.487037] kernel: dlm: 838DD74847E34AA1839D6C4AA31132F1: join complete
> >>>>>>> [  596.491655] kernel: ocfs2: Mounting device (253,16) on (node 1, slot 0) with ordered data mode.
> >>>>>>> [  597.139758] root[5475]: run xfstest generic/266
> >>>>>>> [  596.505896] unknown: run fstests generic/266 at 2023-05-01 10:19:02
> >>>>>>> [  597.186885] systemd[1]: Started /usr/bin/bash -c test -w /proc/self/oom_score_adj && echo 250 > /proc/self/oom_score_adj; exec ./tests/generic/266.
> >>>>>>> [  597.566466] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: joining the lockspace group...
> >>>>>>> [  597.579759] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
> >>>>>>> [  597.579766] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1
> >>>>>>> [  597.581004] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: add member 1
> >>>>>>> [  597.581820] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_members 1 nodes
> >>>>>>> [  597.582292] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: generation 1 slots 1 1:1
> >>>>>>> [  597.582894] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory
> >>>>>>> [  597.583423] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 in 0 new
> >>>>>>> [  597.583991] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 out 0 messages
> >>>>>>> [  597.584547] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 generation 1 done: 4 ms
> >>>>>>> [  597.585079] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: join complete
> >>>>>>> [  597.590136] kernel: JBD2: Ignoring recovery information on journal
> >>>>>>> [  597.594258] kernel: ocfs2: Mounting device (253,32) on (node 1, slot 0) with ordered data mode.
> >>>>>>> [  598.239046] systemd[1]: fstest-scratch.mount: Deactivated successfully.
> >>>>>>> [  597.606575] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: leaving the lockspace group...
> >>>>>>> [  597.607446] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
> >>>>>>> [  598.244507] ocfs2_hb_ctl[5680]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u AB0245F740984C25A9D0907D61DFCF8A
> >>>>>>> [  597.607843] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: release_lockspace final free
> >>>>>>> [  597.615474] kernel: ocfs2: Unmounting device (253,32) on (node 1)
> >>>>>>> [  597.641624] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: joining the lockspace group...
> >>>>>>> [  597.659435] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1
> >>>>>>> [  597.659479] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: group event done 0
> >>>>>>> [  597.661482] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: add member 1
> >>>>>>> [  597.662340] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_members 1 nodes
> >>>>>>> [  597.663211] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: generation 1 slots 1 1:1
> >>>>>>> [  597.664150] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory
> >>>>>>> [  597.665466] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 in 0 new
> >>>>>>> [  597.666370] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover_directory 0 out 0 messages
> >>>>>>> [  597.667235] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: dlm_recover 1 generation 1 done: 0 ms
> >>>>>>> [  597.668428] kernel: dlm: AB0245F740984C25A9D0907D61DFCF8A: join complete
> >>>>>>> [  600.194974] systemd[1]: Started Daily Cleanup of Snapper Snapshots.
> >>>>>>> [  600.207227] dbus-daemon[672]: [system] Activating via systemd: service name='org.opensuse.Snapper' unit='snapperd.service' requested by ':1.10' (uid=0 pid=5728 comm="/usr/lib/snapper/systemd-helper --cleanup")
> >>>>>>> [  600.209151] systemd[1]: Starting DBus interface for snapper...
> >>>>>>> [  600.217270] dbus-daemon[672]: [system] Successfully activated service 'org.opensuse.Snapper'
> >>>>>>> [  600.217370] systemd[1]: Started DBus interface for snapper.
> >>>>>>> [  600.218969] systemd[1]: snapper-cleanup.service: Deactivated successfully.
> >>>>>>> [  599.608010] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: joining the lockspace group...
> >>>>>>> [  599.611041] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1
> >>>>>>> [  599.611127] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
> >>>>>>> [  599.611813] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: add member 1
> >>>>>>> [  599.612019] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_members 1 nodes
> >>>>>>> [  599.612227] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: generation 1 slots 1 1:1
> >>>>>>> [  599.612397] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory
> >>>>>>> [  599.612559] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 in 0 new
> >>>>>>> [  599.612712] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 out 0 messages
> >>>>>>> [  599.612880] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 generation 1 done: 0 ms
> >>>>>>> [  599.613078] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: join complete
> >>>>>>> [  599.615320] kernel: JBD2: Ignoring recovery information on journal
> >>>>>>> [  599.619221] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode.
> >>>>>>> [  600.392044] systemd[1]: fstest-scratch.mount: Deactivated successfully.
> >>>>>>> [  599.780111] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: leaving the lockspace group...
> >>>>>>> [  599.781949] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
> >>>>>>> [  599.784235] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: release_lockspace final free
> >>>>>>> [  600.426136] ocfs2_hb_ctl[5762]: ocfs2_hb_ctl /sbin/ocfs2_hb_ctl -K -u F06626AE7D754F11A0211679C145C8BF
> >>>>>>> [  599.808566] kernel: ocfs2: Unmounting device (254,0) on (node 1)
> >>>>>>> [  599.827803] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: joining the lockspace group...
> >>>>>>> [  599.832848] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1
> >>>>>>> [  599.832984] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: group event done 0
> >>>>>>> [  599.833810] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: add member 1
> >>>>>>> [  599.834272] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_members 1 nodes
> >>>>>>> [  599.834663] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: generation 1 slots 1 1:1
> >>>>>>> [  599.835030] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory
> >>>>>>> [  599.835474] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 in 0 new
> >>>>>>> [  599.835868] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover_directory 0 out 0 messages
> >>>>>>> [  599.836365] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: dlm_recover 1 generation 1 done: 4 ms
> >>>>>>> [  599.836829] kernel: dlm: F06626AE7D754F11A0211679C145C8BF: join complete
> >>>>>>> [  599.845790] kernel: ocfs2: Mounting device (254,0) on (node 1, slot 0) with ordered data mode.
> >>>>>>> [  600.079285] kernel: Buffer I/O error on dev dm-0, logical block 2621424, async page read
> >>>>>>> [  600.090845] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
> >>>>>>> [  600.091387] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
> >>>>>>> [  600.091924] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
> >>>>>>> [  600.092422] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
> >>>>>>> [  600.094252] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
> >>>>>>> [  600.095029] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
> >>>>>>> [  600.095891] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
> >>>>>>> [  600.096698] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
> >>>>>>> [  600.099071] kernel: (xfs_io,5785,0):ocfs2_inode_lock_update:2354 ERROR: status = -5
> >>>>>>> [  600.099957] kernel: (xfs_io,5785,0):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
> >>>>>>> [  600.100798] kernel: (xfs_io,5785,0):ocfs2_statfs:1654 ERROR: status = -5
> >>>>>>> [  600.101623] kernel: (xfs_io,5785,0):ocfs2_statfs:1686 ERROR: status = -5
> >>>>>>> [  600.107094] kernel: (xfs_io,5785,1):ocfs2_get_refcount_tree:358 ERROR: status = -5
> >>>>>>> [  600.107659] kernel: (xfs_io,5785,1):ocfs2_lock_refcount_tree:456 ERROR: status = -5
> >>>>>>> [  600.108175] kernel: (xfs_io,5785,1):ocfs2_refcount_cow_hunk:3417 ERROR: status = -5
> >>>>>>> [  600.108654] kernel: (xfs_io,5785,1):ocfs2_refcount_cow:3478 ERROR: status = -5
> >>>>>>> [  600.109128] kernel: (xfs_io,5785,1):ocfs2_prepare_inode_for_write:2326 ERROR: status = -5
> >>>>>>> [  600.109613] kernel: (xfs_io,5785,1):ocfs2_file_write_iter:2433 ERROR: status = -5
> >>>>>>> [  600.118113] kernel: (umount,5789,3):ocfs2_inode_lock_update:2354 ERROR: status = -5
> >>>>>>> [  600.118481] kernel: (umount,5789,3):ocfs2_inode_lock_full_nested:2502 ERROR: status = -5
> >>>>>>> [  600.118898] kernel: (umount,5789,3):ocfs2_statfs:1654 ERROR: status = -5
> >>>>>>> [  600.119364] kernel: (umount,5789,3):ocfs2_statfs:1686 ERROR: status = -5
> >>>>>>> [  600.759255] systemd[1]: fstest-scratch.mount: Deactivated successfully.
> >>>>>>> [  600.127135] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write
> >>>>>>> [  600.127373] kernel: JBD2: I/O error when updating journal superblock for dm-0-24.
> >>>>>>> [  600.127615] kernel: Aborting journal on device dm-0-24.
> >>>>>>> [  600.127893] kernel: Buffer I/O error on dev dm-0, logical block 529, lost sync page write
> >>>>>>> [  600.128137] kernel: JBD2: I/O error when updating journal superblock for dm-0-24.
> >>>>>>> [  600.128362] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  600.128591] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted.
> >>>>>>> [  601.147704] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  602.171745] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  603.195707] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  604.219709] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  605.243712] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  606.267748] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  607.291715] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  608.315742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  609.339724] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  610.363774] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  611.387693] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  612.411727] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  613.435719] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  614.459743] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  615.483742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  615.484503] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_thread:2346 ERROR: status = -5, journal is already aborted.
> >>>>>>> [  616.507742] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  617.531751] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  618.555743] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  619.579702] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> [  620.603758] kernel: (ocfs2cmt-F06626,5771,2):ocfs2_commit_cache:321 ERROR: status = -5
> >>>>>>> ^C
> >>>>>>> tb-fstest1:/usr/src/linux/fs/ocfs2 #
> >>>>>>> ```
> >>>>>>
> >>>>>> So the case is, flush journal in commit thread will never end if journal
> >>>>>> is abort, and then lead to umount hang since commit thread won't stop,
> >>>>>> right?
> >>>>> Yes.
> >>>>>
> >>>>>>
> >>>>>> If so, I don't think your changes in sync fs is a right solution. It
> >>>>>> seems it just for bypassing the testcase.
> >>>>> No. 
> >>>>> In my view, No. 266 shouldn't trigger jbd2 ABORT status. The 'sync' should
> >>>>> tidy up all already issued write IOs. After sync & load dm-error, new write IOs
> >>>>> fails on rw IOs, which shouldn't generate any journal IOs and shouldn't trigger
> >>>>> jbd2 to enter ABORT status.
> >>>>>
> >>>>
> >>>> What if there is no 'sync' in this case?
> >>>
> >>> It is same with issuing 'sync' case. Please differ:
> >>> 'after [  600.079285] in above log' vs. 'after [  207.623776] in below log'
> >> I mean your solution is to reset j_num_trans in sync fs, but in case of
> >> no 'sync', how do you resolve it?
> > 
> > If I understand your question correctly, your meaning is to remove No. 266
> > 'sync' line, then how to handle this case.
> > My answer is this is my patch [2/2] job. I have tested it already.
> > 
> IMO, they are the same issue. 
> And it looks weird to fix it in sync fs.

I keep my viewpoint there are two issues and need two patches.
sync action should tidy up all write IOs (including journal transactions).
>From the dmesg, it's obviously ocfs2 forgets to reset internal jbd2 counter
(j_num_trans).

But patch [2/2] (from another side) could fix No. 266 umount failed issue. If
you don't like patch [1/2], let's focus to review patch [2/2].

- Heming



More information about the Ocfs2-devel mailing list