[Ocfs2-devel] The last part of the file is zeroed out when write N random bytes

Mon Sep 27 00:16:43 PDT 2021

Hi List,

I'd like to report a data loss bug when write N random bytes, since I saw there were some related commits in the past weeks.
I can reproduce this bug stably with the latest ocfs2 kernel module code as below,
1) Create a three node(e.g. ghe-tw-nd1, ghe-tw-nd2, ghe-tw-nd3) ocfs2 cluster, attach a shared disk(e.g. /dev/vdb).
2) Format the disk with the command "mkfs.ocfs2 -N 4 -b 4096 -C 1048576 /dev/vdb", and mount the disk to /mnt/shared on each node. The cluster size must be greater than 4K, this is the key to the problem.
3) Copy the file write/test scripts to /mnt/shared directory, then run test script on node1 to reproduce this bug. 
    file write script ocfs2_fallocate_bug_plain_write.py: https://pastebin.com/QsXcD8rq
    file test script ocfs2_loop.sh: https://pastebin.com/eTUe2hkW
4) Then, you can meet this bug, the file md5sum is different between from node1 and from node2.
    In fact, the last part of the file is zeroed out from node2.
    e.g.
    file dump from node1: https://pastebin.com/HB92TVS0
    file dump from node2: https://pastebin.com/jBG7HdSz

More information, 
this bug does not exist on some old kernels( e.g. linux-4.12.14-120), but it will happen on some new kernels, I feel this bug is probably NOT caused by ocfs2 commits, since I used old ocfs2 kernel module code on the new kernels, the problem also happened.
Anyway, if you have any comments, please reply this mail.

Thanks
Gang