[Ocfs2-devel] fstrim corrupts ocfs2 filesystems(become ready-only) on SSD device which is managed by multipath

Mon Oct 30 19:07:24 PDT 2017

Hello Ashish,

Just give my feedback according to the testing script,
cat /trim_loop.sh

LOG=./trim_loop.log
DEV=/dev/dm-0
MOUNTDIR=/mnt/shared
BLOCKLIST="512 1K 2K 4K"
CLUSTERLIST="4K 8K 16K 32K 64K 128K 256K 512K 1M"
BLOCKSZ=1K
CLUSTERSZ=1M
set -x

> ${LOG}

for CLUSTERSZ in ${CLUSTERLIST} ;
do
for BLOCKSZ in ${BLOCKLIST} ;
do
echo y | mkfs.ocfs2 -b ${BLOCKSZ} -C ${CLUSTERSZ} -N 4 ${DEV}
mount ${DEV} ${MOUNTDIR}
sleep 1
fstrim -av || echo "`date`  fstrim -av failed in -b ${BLOCKSZ} -C ${CLUSTERSZ}" >> ${LOG}
sleep 1
umount ${MOUNTDIR}
done
done

I can reproduce this bug in some block/cluster size combinations.
Mon Oct 30 10:49:05 CST 2017  fstrim -av failed in -b 4K -C 32K
Mon Oct 30 10:49:11 CST 2017  fstrim -av failed in -b 512 -C 64K
Mon Oct 30 10:49:21 CST 2017  fstrim -av failed in -b 1K -C 64K
Mon Oct 30 10:49:37 CST 2017  fstrim -av failed in -b 2K -C 64K
Mon Oct 30 10:50:03 CST 2017  fstrim -av failed in -b 4K -C 64K
Mon Oct 30 10:50:10 CST 2017  fstrim -av failed in -b 512 -C 128K
Mon Oct 30 10:50:19 CST 2017  fstrim -av failed in -b 1K -C 128K
Mon Oct 30 10:50:36 CST 2017  fstrim -av failed in -b 2K -C 128K
Mon Oct 30 10:51:02 CST 2017  fstrim -av failed in -b 4K -C 128K
Mon Oct 30 10:51:08 CST 2017  fstrim -av failed in -b 512 -C 256K
Mon Oct 30 10:51:18 CST 2017  fstrim -av failed in -b 1K -C 256K
Mon Oct 30 10:51:34 CST 2017  fstrim -av failed in -b 2K -C 256K
Mon Oct 30 10:52:00 CST 2017  fstrim -av failed in -b 4K -C 256K
Mon Oct 30 10:52:07 CST 2017  fstrim -av failed in -b 512 -C 512K
Mon Oct 30 10:52:16 CST 2017  fstrim -av failed in -b 1K -C 512K
Mon Oct 30 10:52:33 CST 2017  fstrim -av failed in -b 2K -C 512K
Mon Oct 30 10:52:59 CST 2017  fstrim -av failed in -b 4K -C 512K
Mon Oct 30 10:53:06 CST 2017  fstrim -av failed in -b 512 -C 1M
Mon Oct 30 10:53:15 CST 2017  fstrim -av failed in -b 1K -C 1M
Mon Oct 30 10:53:32 CST 2017  fstrim -av failed in -b 2K -C 1M
Mon Oct 30 10:53:58 CST 2017  fstrim -av failed in -b 4K -C 1M

The patch can fix this bug, the test shell script can pass in all the cases.

Thanks
Gang

>>> 

> 
> On 10/28/2017 12:44 AM, Gang He wrote:
>> Hello Ashish,
>> Thank for your reply.
>>  From the patch, it looks very related to this bug.
>> But one thing, I feel a little confused.
>> Why was I not able to reproduce this bug in local with a SSD disk?
> Hmm, thats interesting. It could be that the driver for your disk is not 
> zeroing those blocks for some reason ...
> You could try to simulate this by creating ocfs2 on a loop device and 
> running fstrim on it.
> loop converts fstrim to fallocate and puches a hole in the range, so it 
> should zero out the range and
> cause corruption by zeroing the group descriptor.
> 
> 
>> There are any specific steps to reproduce this issue?
> 
> I was able to reproduce this with block size 4k and cluster size 1M. No 
> other special options.
> 
> Thanks,
> Ashish
>> e.g. mount option for ocfs2? need to set SSD disk?
>> According to the patch, the bug is not related to multipath configuration.
>>
>>
>> Thanks
>> Gang
>>
>>
>>
>>>>> Ashish Samant <ashish.samant at oracle.com> 10/28/17 2:06 AM >>>
>> Hi Gang,
>>
>> The following patch sent to the list should fix the issue.
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.org_patch_10002583_&d=DwIFAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5YTpkKY057SbK10&r=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y&m=F-6T-Yae9vzJG5plYToA1dj8CcZftiZiyyiuLlkZ9PE&s=MUEzA-cXEHRe0nvV19ssunR8ypufaqzrr7o6EaNc6N4&e= 
>>
>> Thanks,
>> Ashish
>>
>>
>> On 10/27/2017 02:47 AM, Gang He wrote:
>>> Hello Guys,
>>>
>>> I got a bug from the customer, he said, fstrim command corrupted ocfs2 file 
> system on their SSD SAN, the file system became read-only and SSD LUN was 
> configured by multipath.
>>> After umount the file system, the customer ran fsck.ocfs2 on this file 
> system, then the file system can be mounted until the next fstrim happens.
>>> The error messages were likes,
>>> 2017-10-02T00:00:00.334141+02:00 rz-xen10 systemd[1]: Starting Discard unused 
> blocks...
>>> 2017-10-02T00:00:00.383805+02:00 rz-xen10 fstrim[36615]: fstrim: /xensan1: 
> FITRIM ioctl fehlgeschlagen: Das Dateisystem ist nur lesbar
>>> 2017-10-02T00:00:00.385233+02:00 rz-xen10 kernel: [1092967.091821] OCFS2: ERROR 
> (device dm-5): ocfs2_validate_gd_self: Group descriptor #8257536 has bad 
> signature  <<== here
>>> 2017-10-02T00:00:00.385251+02:00 rz-xen10 kernel: [1092967.091831] On-disk 
> corruption discovered. Please run fsck.ocfs2 once the filesystem is 
> unmounted.
>>> 2017-10-02T00:00:00.385254+02:00 rz-xen10 kernel: [1092967.091836] 
> (fstrim,36615,5):ocfs2_trim_fs:7422 ERROR: status = -30
>>> 2017-10-02T00:00:00.385854+02:00 rz-xen10 systemd[1]: fstrim.service: Main 
> process exited, code=exited, status=32/n/a
>>> 2017-10-02T00:00:00.386756+02:00 rz-xen10 systemd[1]: Failed to start Discard 
> unused blocks.
>>> 2017-10-02T00:00:00.387236+02:00 rz-xen10 systemd[1]: fstrim.service: Unit 
> entered failed state.
>>> 2017-10-02T00:00:00.387601+02:00 rz-xen10 systemd[1]: fstrim.service: Failed 
> with result 'exit-code'.
>>>
>>> The similar bug looks like 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.launchpad.net_ubun 
> tu_-2Bsource_util-2Dlinux_-2Bbug_1681410&d=DwIFAg&c=RoP1YumCXCgaWHvlZYR8PQcxBKCX5Y
> TpkKY057SbK10&r=f4ohdmGrYxZejY77yzx3eNgTHb1ZAfZytktjHqNVzc8&m=Jdo98IlzJDxBqiDEh
> sKfqxvEt4B6WpIbZ_woY7zmLFw&s=xp0bUwpDVIHZP9g4EboYYG_1gkenzWEt_O_5KZXyFg8&e= .
>>> Then, I tried to reproduce this bug in local.
>>> Since I have not a SSD SAN, I found a PC server which has a SSD disk.
>>> I setup a two nodes ocfs2 cluster in VM on this PC server, attach this SSD 
> disk to each VM instance twice, then I can configure this SSD disk with 
> multipath tool,
>>> the configuration on each node likes,
>>> sle12sp3-nd1:/ # multipath -l
>>> INTEL_SSDSA2M040G2GC_CVGB0490002C040NGN dm-0 ATA,INTEL SSDSA2M040
>>> size=37G features='1 retain_attached_hw_handler' hwhandler='0' wp=rw
>>> |-+- policy='service-time 0' prio=0 status=active
>>> | `- 0:0:0:0 sda 8:0  active undef unknown
>>> `-+- policy='service-time 0' prio=0 status=enabled
>>>     `- 0:0:0:1 sdb 8:16 active undef unknown
>>>
>>> Next, I do some fstrim command from each node simultaneously,
>>> I also do dd command to write data to the shared SSD disk during fstrim 
> commands.
>>> But, I can not reproduce this issue, all the things go well.
>>>
>>> Then, I'd like to ping the list, did who ever encounter this bug?  If yes, 
> please help to provide some information.
>>> I think there are three factors which are related to this bug, SSD device 
> type, multipath configuration and simultaneously fstrim.
>>>
>>> Thanks a lot.
>>> Gang
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Ocfs2-devel mailing list
>>> Ocfs2-devel at oss.oracle.com 
>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel 
>>>
>>
>>
>>