[Ocfs2-devel] [PATCH v2] ocfs2: Fix start offset to ocfs2_zero_range_for_truncate()
Eric Ren
zren at suse.com
Tue Aug 30 18:29:26 PDT 2016
Hi Ashish,
On 08/31/2016 07:17 AM, Ashish Samant wrote:
> Hi Eric,
>
> I am able to reproduce this on 4.8.0-rc3 as well. Can you try again and issue a sync
> between fallocate and dd?
It works!
ocfs2dev2 is not patched:
====================
ocfs2dev2:/mnt/ocfs2 # reflink -f 10MBfile reflnktest
ocfs2dev2:/mnt/ocfs2 # fallocate -p -o 0 -l 1048615 reflnktest
ocfs2dev2:/mnt/ocfs2 # sync
ocfs2dev2:/mnt/ocfs2 # dd if=10MBfile iflag=direct bs=1048576 count=1 | hexdump -C
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0936888 s, 11.2 MB/s
00100000
while ocfs2dev1 is patched:
======================
ocfs2dev1:/mnt/ocfs2 # xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile
wrote 10485760/10485760 bytes at offset 0
10 MiB, 2560 ops; 0.0000 sec (183.137 MiB/sec and 46883.0122 ops/sec)
ocfs2dev1:/mnt/ocfs2 # reflink -f 10MBfile reflnktest
ocfs2dev1:/mnt/ocfs2 # fallocate -p -o 0 -l 1048615 reflnktest
ocfs2dev1:/mnt/ocfs2 # sync
ocfs2dev1:/mnt/ocfs2 # dd if=10MBfile iflag=direct bs=1048576 count=1 | hexdump -C
00000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd |................|
*
1+0 records in
1+0 records out
1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0933082 s, 11.2 MB/s
00100000
>
> On 08/30/2016 12:38 AM, Eric Ren wrote:
>> Hi,
>>
>> I'm on 4.8.0-rc3 kernel. Hope someone else can double-confirm this;-)
>>
>> On 08/30/2016 12:11 PM, Ashish Samant wrote:
>>> Hmm, thats weird. I see this on 4.7 kernel without the patch:
>>>
>>> # xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile
>>> wrote 10485760/10485760 bytes at offset 0
>>> 10 MiB, 2560 ops; 0.0000 sec (683.995 MiB/sec and 175102.5992 ops/sec)
>>> # reflink -f 10MBfile reflnktest
>>> # fallocate -p -o 0 -l 1048615 reflnktest
>>> # dd if=10MBfile iflag=direct bs=1048576 count=1 | hexdump -C
>>> 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
>>> *
>>> 1+0 records in
>>> 1+0 records out
>>> 1048576 bytes (1.0 MB) copied, 0.0321517 s, 32.6 MB/s
>>> 00100000
>>>
>>> and with patch
>>> ----
>>> # dd if=10MBfile iflag=direct bs=1M count=1 | hexdump -C
>>> 00000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd |................|
>>
>> I'm not familiar with this code. So why is the output "cd ..."? because we didn't write
>> anything
>> into "10MBfile". Is it a magic number when reading from a hole?
> No, "cd" is what xfs_io wrote into the file. Those are the original contents of the file
> which are overwritten by 0 in the first cluster because of this bug.
Ah, gotcha, thanks!
Eric
>
> Thanks,
> Ashish
>>
>> Eric
>>
>>> *
>>> 1+0 records in
>>> 1+0 records out
>>> 00100000
>>
>>
>>
>>>
>>> Thanks,
>>> Ashish
>>>
>>>
>>> On 08/29/2016 08:33 PM, Eric Ren wrote:
>>>> Hello,
>>>>
>>>> On 08/30/2016 03:23 AM, Ashish Samant wrote:
>>>>> Hi Eric,
>>>>>
>>>>> The easiest way to reproduce this is :
>>>>>
>>>>> 1. Create a random file of say 10 MB
>>>>> xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile
>>>>> 2. Reflink it
>>>>> reflink -f 10MBfile reflnktest
>>>>> 3. Punch a hole at starting at cluster boundary with range greater that 1MB. You can
>>>>> also use a range that will put the end offset in another extent.
>>>>> fallocate -p -o 0 -l 1048615 reflnktest
>>>>> 4. sync
>>>>> 5. Check the first cluster in the source file. (It will be zeroed out).
>>>>> dd if=10MBfile iflag=direct bs=<cluster size> count=1 | hexdump -C
>>>>
>>>> Thanks! I have a try myself, but I'm not sure what is our expected output and if the
>>>> test result meet
>>>> it:
>>>>
>>>> 1. After applying this patch:
>>>> ocfs2dev1:/mnt/ocfs2 # rm 10MBfile reflnktest
>>>> ocfs2dev1:/mnt/ocfs2 # xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile
>>>> wrote 10485760/10485760 bytes at offset 0
>>>> 10 MiB, 2560 ops; 0.0000 sec (1.089 GiB/sec and 285427.5839 ops/sec)
>>>> ocfs2dev1:/mnt/ocfs2 # reflink -f 10MBfile reflnktest
>>>> ocfs2dev1:/mnt/ocfs2 # fallocate -p -o 0 -l 1048615 reflnktest
>>>> ocfs2dev1:/mnt/ocfs2 # dd if=10MBfile iflag=direct bs=1048576 count=1 | hexdump -C
>>>> 00000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd |................|
>>>> *
>>>> 1+0 records in
>>>> 1+0 records out
>>>> 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0952464 s, 11.0 MB/s
>>>> 00100000
>>>>
>>>> 2. Before this patch:
>>>> ....
>>>> ocfs2dev1:/mnt/ocfs2 # dd if=10MBfile iflag=direct bs=1048576 count=1 | hexdump -C
>>>> 00000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd |................|
>>>> *
>>>> 1+0 records in
>>>> 1+0 records out
>>>> 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0954648 s, 11.0 MB/s
>>>> 00100000
>>>>
>>>> 3. debugfs.ocfs2 -R stats /dev/sdb
>>>> ...
>>>> Block Size Bits: 12 Cluster Size Bits: 20
>>>> ...
>>>>
>>>> Eric
>>>>>
>>>>> Thanks,
>>>>> Ashish
>>>>>
>>>>> On 08/28/2016 10:39 PM, Eric Ren wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Thanks for this fix. I'd like to reproduce this issue locally and test this patch,
>>>>>> could you elaborate the detailed steps of reproduction?
>>>>>>
>>>>>> Thanks,
>>>>>> Eric
>>>>>>
>>>>>> On 08/27/2016 07:04 AM, Ashish Samant wrote:
>>>>>>> If we punch a hole on a reflink such that following conditions are met:
>>>>>>>
>>>>>>> 1. start offset is on a cluster boundary
>>>>>>> 2. end offset is not on a cluster boundary
>>>>>>> 3. (end offset is somewhere in another extent) or
>>>>>>> (hole range > MAX_CONTIG_BYTES(1MB)),
>>>>>>>
>>>>>>> we dont COW the first cluster starting at the start offset. But in this
>>>>>>> case, we were wrongly passing this cluster to
>>>>>>> ocfs2_zero_range_for_truncate() to zero out. This will modify the cluster
>>>>>>> in place and zero it in the source too.
>>>>>>>
>>>>>>> Fix this by skipping this cluster in such a scenario.
>>>>>>>
>>>>>>> Reported-by: Saar Maoz <saar.maoz at oracle.com>
>>>>>>> Signed-off-by: Ashish Samant <ashish.samant at oracle.com>
>>>>>>> Reviewed-by: Srinivas Eeda <srinivas.eeda at oracle.com>
>>>>>>> ---
>>>>>>> v1->v2:
>>>>>>> -Changed the commit msg to include a better and generic description of
>>>>>>> the problem, for all cluster sizes.
>>>>>>> -Added Reported-by and Reviewed-by tags.
>>>>>>> fs/ocfs2/file.c | 34 ++++++++++++++++++++++++----------
>>>>>>> 1 file changed, 24 insertions(+), 10 deletions(-)
>>>>>>>
>>>>>>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
>>>>>>> index 4e7b0dc..0b055bf 100644
>>>>>>> --- a/fs/ocfs2/file.c
>>>>>>> +++ b/fs/ocfs2/file.c
>>>>>>> @@ -1506,7 +1506,8 @@ static int ocfs2_zero_partial_clusters(struct inode *inode,
>>>>>>> u64 start, u64 len)
>>>>>>> {
>>>>>>> int ret = 0;
>>>>>>> - u64 tmpend, end = start + len;
>>>>>>> + u64 tmpend = 0;
>>>>>>> + u64 end = start + len;
>>>>>>> struct ocfs2_super *osb = OCFS2_SB(inode->i_sb);
>>>>>>> unsigned int csize = osb->s_clustersize;
>>>>>>> handle_t *handle;
>>>>>>> @@ -1538,18 +1539,31 @@ static int ocfs2_zero_partial_clusters(struct inode *inode,
>>>>>>> }
>>>>>>> /*
>>>>>>> - * We want to get the byte offset of the end of the 1st cluster.
>>>>>>> + * If start is on a cluster boundary and end is somewhere in another
>>>>>>> + * cluster, we have not COWed the cluster starting at start, unless
>>>>>>> + * end is also within the same cluster. So, in this case, we skip this
>>>>>>> + * first call to ocfs2_zero_range_for_truncate() truncate and move on
>>>>>>> + * to the next one.
>>>>>>> */
>>>>>>> - tmpend = (u64)osb->s_clustersize + (start & ~(osb->s_clustersize - 1));
>>>>>>> - if (tmpend > end)
>>>>>>> - tmpend = end;
>>>>>>> + if ((start & (csize - 1)) != 0) {
>>>>>>> + /*
>>>>>>> + * We want to get the byte offset of the end of the 1st
>>>>>>> + * cluster.
>>>>>>> + */
>>>>>>> + tmpend = (u64)osb->s_clustersize +
>>>>>>> + (start & ~(osb->s_clustersize - 1));
>>>>>>> + if (tmpend > end)
>>>>>>> + tmpend = end;
>>>>>>> - trace_ocfs2_zero_partial_clusters_range1((unsigned long long)start,
>>>>>>> - (unsigned long long)tmpend);
>>>>>>> + trace_ocfs2_zero_partial_clusters_range1(
>>>>>>> + (unsigned long long)start,
>>>>>>> + (unsigned long long)tmpend);
>>>>>>> - ret = ocfs2_zero_range_for_truncate(inode, handle, start, tmpend);
>>>>>>> - if (ret)
>>>>>>> - mlog_errno(ret);
>>>>>>> + ret = ocfs2_zero_range_for_truncate(inode, handle, start,
>>>>>>> + tmpend);
>>>>>>> + if (ret)
>>>>>>> + mlog_errno(ret);
>>>>>>> + }
>>>>>>> if (tmpend < end) {
>>>>>>> /*
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>
More information about the Ocfs2-devel
mailing list