[Ocfs2-devel] [PATCH v2] ocfs2: add error handling path when jbd2 enter ABORT status

Heming Zhao heming.zhao at suse.com
Mon Jun 26 14:56:12 UTC 2023


On 6/26/23 7:24 PM, Heming Zhao wrote:
> fstest generic cases 347 361 628 629 trigger a same issue:
> When jbd2 enter ABORT status, ocfs2 ignores it and keep going to commit
> journal. This issue causes umount failure (hanging).
> 
> This commit gives ocfs2 ability to handle jbd2 ABORT case. After
> patching, umount successfully and leave journal dirty status.
> 
> Signed-off-by: Heming Zhao <heming.zhao at suse.com>
> ---
> v2:
>     (this v2 only for patch [2/2], see below URL)
>     add description in commit log.
>     don't reset ->j_num_trans and leave journal dirty status on disk.
> 
> v1: https://oss.oracle.com/pipermail/ocfs2-devel/2023-April/000896.html
> ---
>   fs/ocfs2/alloc.c      | 10 ++++++----
>   fs/ocfs2/journal.c    | 17 +++++++++++++++--
>   fs/ocfs2/journal.h    |  5 +++--
>   fs/ocfs2/localalloc.c |  3 +++
>   4 files changed, 27 insertions(+), 8 deletions(-)
> 

It looks I found 2 bugs when testing this patch.
1> fsck.ocfs2 doesn't succeed to repair unclean jbd2 data but output successfully.
2> fsck.ocfs2 doesn't remove dlm lockspace when exit.

** how to trigger **

first do below steps:
(/dev/vdg size should bigger than 1GB)

(these steps derived from fstest generic 361)
```

mkfs -t ocfs2 -F -N 4 --cluster-stack pcmk --cluster-name tst -b 4096 /dev/vdg 131072

mount -t ocfs2 /dev/vdg /fstest/scratch

/sbin/xfs_io -i -fc 'truncate 1g' /fstest/scratch/fs.img

mkdir /fstest/scratch/mnt

losetup -f --show /fstest/scratch/fs.img

losetup --direct-io=on /dev/loop0

mkfs -t ocfs2 -N 4 /dev/loop0

mount -t ocfs2 /dev/loop0 /fstest/scratch/mnt



/sbin/xfs_io -i -fc 'pwrite 0 520m' /fstest/scratch/mnt/testfile

/usr/bin/mount -o remount,ro /fstest/scratch/mnt



umount /fstest/scratch/mnt  (<-- trigger jbd2 abort)

```



the last 'umount' will trigger jbd2 enter abort status, then leave unclean
journal data on disk.

then using fsck.ocfs2 repair this loop0 device, this software output 'successfully',
but the real status is this partition still fails to mount.


```

tb-fstest1:~ # mount -t ocfs2 /dev/loop0 /fstest/scratch/mnt

mount.ocfs2: Internal logic failure while mounting /dev/loop0 on /fstest/scratch/mnt. Check 'dmesg' for more information on this error 5.



tb-fstest1:~ # fsck.ocfs2 /dev/loop0

fsck.ocfs2 1.8.7

Checking OCFS2 filesystem in /dev/loop0:

   Label:              <NONE>

   UUID:               4990448088164475A620ACD486ACFC8E

   Number of blocks:   262144

   Block size:         4096

   Number of clusters: 262144

   Cluster size:       4096

   Number of slots:    4



/dev/loop0 wasn't cleanly unmounted by all nodes.  Attempting to replay the journals for nodes that didn't unmount cleanly

Checking each slot's journal.

Replaying slot 0's journal.

*** There were problems replaying journals.  Be careful in telling fsck to make repairs to this filesystem.

Slot 0's local alloc replayed successfully

/dev/loop0 is clean.  It will be checked after 20 additional mounts.

Slot 0's journal dirty flag removed



tb-fstest1:~ # mount -t ocfs2 /dev/loop0 /fstest/scratch/mnt

mount.ocfs2: Internal logic failure while mounting /dev/loop0 on /fstest/scratch/mnt. Check 'dmesg' for more information on this error 17.


tb-fstest1:~ # ls /sys/kernel/dlm/

0B90A8FBD12C4ED3BB947682045C8B51  4990448088164475A620ACD486ACFC8E


tb-fstest1:~ # ps axj |grep ocfs2

     2  1232     0     0 ?           -1 I<       0   0:00 [ocfs2_wq]

     2  1233     0     0 ?           -1 S        0   0:00 [ocfs2dc-0B90A8FBD12C4ED3BB947682045C8B51]

     2  1240     0     0 ?           -1 S        0   0:00 [ocfs2cmt-0B90A8FBD12C4ED3BB947682045C8B51]


tb-fstest1:~ # dlm_tool leave 4990448088164475A620ACD486ACFC8E

Leaving lockspace "4990448088164475A620ACD486ACFC8E"

done


tb-fstest1:~ # mount -t ocfs2 /dev/loop0 /fstest/scratch/mnt

mount.ocfs2: Internal logic failure while mounting /dev/loop0 on /fstest/scratch/mnt. Check 'dmesg' for more information on this error 5.
```




Thanks,
Heming



More information about the Ocfs2-devel mailing list