[Ocfs2-devel] [PATCH v2] ocfs2: add error handling path when jbd2 enter ABORT status

Heming Zhao heming.zhao at suse.com
Mon Jun 26 15:09:16 UTC 2023


(sorry for last mail with mess format, resend)

On Mon, Jun 26, 2023 at 07:24:53PM +0800, Heming Zhao wrote:
> fstest generic cases 347 361 628 629 trigger a same issue:
> When jbd2 enter ABORT status, ocfs2 ignores it and keep going to commit
> journal. This issue causes umount failure (hanging).
> 
> This commit gives ocfs2 ability to handle jbd2 ABORT case. After
> patching, umount successfully and leave journal dirty status.
> 
> Signed-off-by: Heming Zhao <heming.zhao at suse.com>
> ---
> v2:
>    (this v2 only for patch [2/2], see below URL)
>    add description in commit log.
>    don't reset ->j_num_trans and leave journal dirty status on disk.
> 
> v1: https://oss.oracle.com/pipermail/ocfs2-devel/2023-April/000896.html
> ---
>  fs/ocfs2/alloc.c      | 10 ++++++----
>  fs/ocfs2/journal.c    | 17 +++++++++++++++--
>  fs/ocfs2/journal.h    |  5 +++--
>  fs/ocfs2/localalloc.c |  3 +++
>  4 files changed, 27 insertions(+), 8 deletions(-)
> 

It looks I found 2 bugs when testing this patch.
1> fsck.ocfs2 doesn't succeed to repair unclean jbd2 data but output successfully.
2> fsck.ocfs2 doesn't remove dlm lockspace when exit.

** how to trigger **

first do below steps:
(/dev/vdg size should bigger than 1GB)
(these steps derived from fstest generic 361)
```
mkfs -t ocfs2 -F -N 4 --cluster-stack pcmk --cluster-name tst -b 4096 /dev/vdg 131072
mount -t ocfs2 /dev/vdg /fstest/scratch
/sbin/xfs_io -i -fc 'truncate 1g' /fstest/scratch/fs.img
mkdir /fstest/scratch/mnt
losetup -f --show /fstest/scratch/fs.img
losetup --direct-io=on /dev/loop0
mkfs -t ocfs2 -N 4 /dev/loop0
mount -t ocfs2 /dev/loop0 /fstest/scratch/mnt

/sbin/xfs_io -i -fc 'pwrite 0 520m' /fstest/scratch/mnt/testfile
/usr/bin/mount -o remount,ro /fstest/scratch/mnt

umount /fstest/scratch/mnt  (<-- trigger jbd2 abort)
```

The last 'umount' will trigger jbd2 enter abort status, then leave unclean
journal data on disk.
Then using fsck.ocfs2 repair this loop0 device, this software output
'successfully', but the real status is this partition still fails to mount.

```
tb-fstest1:~ # mount -t ocfs2 /dev/loop0 /fstest/scratch/mnt
mount.ocfs2: Internal logic failure while mounting /dev/loop0 on /fstest/scratch/mnt. Check 'dmesg' for more information on this error 5.

tb-fstest1:~ # fsck.ocfs2 /dev/loop0
fsck.ocfs2 1.8.7
Checking OCFS2 filesystem in /dev/loop0:
  Label:              <NONE>
  UUID:               4990448088164475A620ACD486ACFC8E
  Number of blocks:   262144
  Block size:         4096
  Number of clusters: 262144
  Cluster size:       4096
  Number of slots:    4

/dev/loop0 wasn't cleanly unmounted by all nodes.  Attempting to replay the journals for nodes that didn't unmount cleanly
Checking each slot's journal.
Replaying slot 0's journal.
*** There were problems replaying journals.  Be careful in telling fsck to make repairs to this filesystem.
Slot 0's local alloc replayed successfully
/dev/loop0 is clean.  It will be checked after 20 additional mounts.
Slot 0's journal dirty flag removed

tb-fstest1:~ # mount -t ocfs2 /dev/loop0 /fstest/scratch/mnt
mount.ocfs2: Internal logic failure while mounting /dev/loop0 on
/fstest/scratch/mnt. Check 'dmesg' for more information on this
error 17.

tb-fstest1:~ # ls /sys/kernel/dlm/
0B90A8FBD12C4ED3BB947682045C8B51  4990448088164475A620ACD486ACFC8E

tb-fstest1:~ # ps axj |grep ocfs2
2  1232     0     0 ?           -1 I<       0   0:00 [ocfs2_wq]
2  1233     0     0 ?           -1 S        0   0:00 [ocfs2dc-0B90A8FBD12C4ED3BB947682045C8B51]
2  1240     0     0 ?           -1 S        0   0:00 [ocfs2cmt-0B90A8FBD12C4ED3BB947682045C8B51]

tb-fstest1:~ # dlm_tool leave 4990448088164475A620ACD486ACFC8E
Leaving lockspace "4990448088164475A620ACD486ACFC8E"
done

tb-fstest1:~ # mount -t ocfs2 /dev/loop0 /fstest/scratch/mnt
mount.ocfs2: Internal logic failure while mounting /dev/loop0 on /fstest/scratch/mnt. Check 'dmesg' for more information on this error 5.
```

Thanks,
Heming



More information about the Ocfs2-devel mailing list