[Ocfs2-devel] FIEMAP problem

Jeff Liu jeff.liu at oracle.com
Thu Aug 29 01:39:42 PDT 2013


On 08/27/2013 04:18 PM, David Weber wrote:

> Am Donnerstag, 8. August 2013, 09:20:45 schrieb Sunil Mushran:
>> So it's a test issue. The utility assumes the fs allocates in 4K units.
>> That's why it only works when clustersize is 4K.
> 
> Thanks for the clarification!
> 
> The patch seems to have solved our problem. It would be great if it could be 
> pushed to Linux.

I'll resend this patch for the review.  Sorry for the late response as I
just back from a longer vacation.

Thanks,
-Jeff

> 
> Cheers,
> David
> 
>>
>> On Thu, Aug 8, 2013 at 8:09 AM, David Weber <wb at munzinger.de> wrote:
>>> Am Donnerstag, 8. August 2013, 07:30:27 schrieb Sunil Mushran:
>>>> Interesting. Please can you print the inode disk using the command
>>>> below.
>>>> The file path is minus the mounted dir.
>>>>
>>>> debugfs.ocfs2 -R "stat /relative/path/to/file" /dev/DEVICE
>>>>
>>>> It is saying that the fs has allocated a block when it did not need to.
>>>
>>> It
>>>
>>>> could be that the test utility does not handle blocks larger than 4K, or
>>>> the fiemap ioctl has a bug or the fs is indeed allocating a block when
>>>> it
>>>> does not need to. The above command will show us the actual layout on
>>>
>>> disk.
>>>
>>> Thank you for looking into this!
>>>
>>> # ./fiemap-tester /mnt/kvm-images/fiemap_new
>>> Starting infinite run, if you don't see any output then its working
>>> properly.
>>> HEY FS PERSON: your fs is weird.  I specifically wanted a
>>> hole and you allocated a block anyway.  FIBMAP confirms that
>>> you allocated a block, and the block is filled with 0's so
>>> everything is kosher, but you still allocated a block when
>>> didn't need to.  This may or may not be what you wanted,
>>> which is why I'm only printing this message once, in case
>>> you didn't do it on purpose. This was at block 0.
>>> ERROR: preallocated extent is not marked with FIEMAP_EXTENT_UNWRITTEN: 0
>>> map is
>>>
>>> 'HDHPHHDDHPHPHPHDDHHPPDDPPPHHHPDDDPDHHHHDDDPPHPPPDPHHPPDPPHHDDPDPPHDHPDDDD
>>> PDPPDPHDDPPDDPPHDDPDHHHDDPDHPHPDPPDDHPHPPHDPHPHDDHDPDPDHDHPDDPHPPPHDPPDPDD
>>> HPHDDPPHPDHPPHPPHPHHPHDHPPDDPHDHHPPHPPDHPHPHDHPPDDDDPHHHPPPHHHDDDDPDPDDPPP
>>> HPHDPPPHDPDPHDDHPPPDPDHPHHPHDHHDHPDPHDDPPHDPPDDPDDPPDHPPDPDHHPHDHPPHDDHDPH
>>> PPPDHPDDDHDDHDPPHHDDPPDPDDHDHHPHDPHHPPPDPPDHDHHPPHDPHDPPHDPHHPPP' logical:
>>> [       0..     255] phys: 206615552..206615807 flags: 0x000 tot: 256
>>> Problem comparing fiemap and map
>>>
>>> # debugfs.ocfs2 -R "stat /fiemap_new" /dev/drbd0
>>>
>>>         Inode: 92668161   Mode: 0644   Generation: 3713753505 (0xdd5b61a1)
>>>         FS Generation: 2357962590 (0x8c8ba75e)
>>>         CRC32: 00000000   ECC: 0000
>>>         Type: Regular   Attr: 0x0   Flags: Valid
>>>         Dynamic Features: (0x0)
>>>         User: 0 (root)   Group: 0 (root)   Size: 1470464
>>>         Links: 1   Clusters: 2
>>>         ctime: 0x5203b200 0x991cd -- Thu Aug  8 16:58:08.627149 2013
>>>         atime: 0x5203b200 0xc0accc -- Thu Aug  8 16:58:08.12627148 2013
>>>         mtime: 0x5203b200 0x991cd -- Thu Aug  8 16:58:08.627149 2013
>>>         dtime: 0x0 -- Thu Jan  1 01:00:00 1970
>>>         Refcount Block: 0
>>>         Last Extblk: 0   Orphan Slot: 0
>>>         Sub Alloc Slot: 0   Sub Alloc Bit: 1
>>>         Tree Depth: 0   Count: 243   Next Free Rec: 2
>>>         ## Offset        Clusters       Block#          Flags
>>>         0  0             1              206615552       0x0
>>>         1  1             1              206619648       0x0
>>>>
>>>> On Aug 8, 2013, at 2:16 AM, David Weber <wb at munzinger.de> wrote:
>>>>> Am Mittwoch, 7. August 2013, 22:07:19 schrieb Jeff Liu:
>>>>>> On 08/07/2013 05:17 PM, David Weber wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> We are trying to use OCFS2 as VM storage. After running into
>>>>>>> problems
>>>>>>> with
>>>>>>> qemu's disk_mirror feature we now think there could be a problem
>>>>>>> with
>>>>>>> the
>>>>>>> FIEMAP ioctl in OCFS2.
>>>>>>>
>>>>>>> As far as I understand the situation looks like this:
>>>>>>> Qemu inquiries the FS if the given section of the image is already
>>>>>>> allocated via the FIEMAP ioctl [1]
>>>>>>> It especially checks if fm_mapped_extents is greater 0.
>>>>>>> OCFS2 reports on sections bigger 1048576 there would be 0
>>>
>>> mapped_extents
>>>
>>>>>>> which is wrong.
>>>>>>>
>>>>>>> I extended a userspace FIEMAP util [2] a bit to specify the start
>>>>>>> and
>>>>>>> length parameter [3] as an easier testcase.
>>>>>>>
>>>>>>> When we create a big file which has no holes
>>>>>>> dd if=/dev/urandom of=/mnt/kvm-images/urandom.img bs=1M count=1000
>>>>>>>
>>>>>>> We get on lower sections the expected output:
>>>>>>> ./a.out /mnt/kvm-images/urandom.img 10000 10
>>>>>>> start: 2710, length: a
>>>>>>> File /mnt/kvm-images/urandom.img has 1 extents:
>>>>>>> #       Logical          Physical         Length           Flags
>>>>>>> 0:      0000000000000000 0000004ca3f00000 000000000be00000 0000
>>>>>>>
>>>>>>> But on sections >= 1048576 it reports there wouldn't be any extents
>>>>>>> which
>>>>>>> is as far as I understand wrong:
>>>>>>> ./a.out /mnt/kvm-images/urandom.img 1048576 10
>>>>>>> start: 100000, length: a
>>>>>>> File /mnt/kvm-images/urandom.img has 0 extents:
>>>>>>> #       Logical          Physical         Length           Flags
>>>>>>
>>>>>> Thanks for your report, looks this problem has existed over years.
>>>>>> As a quick response, could you please try the below fix?
>>>>>
>>>>> Thank you very much! This solved the problems with qemu.
>>>>>
>>>>> I found a fiemap-tester util[1] in the xfstests project and it runs
>>>
>>> fine
>>>
>>>>> on
>>>>> OCFS2 with 4K cluster size but fails with 1M. I have however no idea
>>>>> if
>>>>> this is a severe problem.
>>>>>
>>>>> # gcc -DHAVE_FALLOCATE=1 -o fiemap-tester fiemap-tester.c
>>>>> # ./fiemap-tester /mnt/kvm-images/fiemap_test
>>>>> Starting infinite run, if you don't see any output then its working
>>>>> properly. HEY FS PERSON: your fs is weird.  I specifically wanted a
>>>>> hole and you allocated a block anyway.  FIBMAP confirms that
>>>>> you allocated a block, and the block is filled with 0's so
>>>>> everything is kosher, but you still allocated a block when
>>>>> didn't need to.  This may or may not be what you wanted,
>>>>> which is why I'm only printing this message once, in case
>>>>> you didn't do it on purpose. This was at block 0.
>>>
>>>>> ERROR: preallocated extent is not marked with FIEMAP_EXTENT_UNWRITTEN:
>>> 0
>>>
>>>>> map is
>>>
>>> 'HDHPHHDDHPHPHPHDDHHPPDDPPPHHHPDDDPDHHHHDDDPPHPPPDPHHPPDPPHHDDPDPPHDHPDDDD
>>>
>>> PDPPDPHDDPPDDPPHDDPDHHHDDPDHPHPDPPDDHPHPPHDPHPHDDHDPDPDHDHPDDPHPPPHDPPDPDD
>>>
>>> HPHDDPPHPDHPPHPPHPHHPHDHPPDDPHDHHPPHPPDHPHPHDHPPDDDDPHHHPPPHHHDDDDPDPDDPPP
>>>
>>> HPHDPPPHDPDPHDDHPPPDPDHPHHPHDHHDHPDPHDDPPHDPPDDPDDPPDHPPDPDHHPHDHPPHDDHDPH
>>>
>>>>> PPPDHPDDDHDDHDPPHHDDPPDPDDHDHHPHDPHHPPPDPPDHDHHPPHDPHDPPHDPHHPPP'
>>>
>>> logical:
>>>>> [       0..     255] phys: 132160512..132160767 flags: 0x000 tot: 256
>>>>>
>>>>>
>>>>> [1]
>>>
>>> http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/cmds/xfstests.git;a=blob_plai
>>>
>>>>> n;f=src/fiemap-tester.c;hb=HEAD>
>>>>>
>>>>>> From: Jie Liu <jeff.liu at oracle.com>
>>>>>>
>>>>>> Call fiemap ioctl(2) with given start offset as well as an desired
>>>>>> mapping range should show extents if possible.  However, we calculate
>>>>>> the end offset of mapping via 'mapping_end -= cpos' before iterating
>>>>>> the extent records which would cause problems, e.g,
>>>>>>
>>>>>> Cluster size 4096:
>>>>>> debugfs.ocfs2 1.6.3
>>>>>>
>>>>>>        Block Size Bits: 12   Cluster Size Bits: 12
>>>>>>
>>>>>> The extended fiemap test utility From David:
>>>>>> https://gist.github.com/anonymous/6172331
>>>>>>
>>>>>> # dd if=/dev/urandom of=/ocfs2/test_file bs=1M count=1000
>>>>>> # ./fiemap /ocfs2/test_file 4096 10
>>>>>> start: 4096, length: 10
>>>>>> File /ocfs2/test_file has 0 extents:
>>>>>> #    Logical          Physical         Length           Flags
>>>>>>
>>>>>>    ^^^^^ <-- No extents
>>>>>>
>>>>>> In this case, at ocfs2_fiemap(): cpos == mapping_end == 1. Hence the
>>>>>> loop of searching extent records was not executed at all.
>>>>>>
>>>>>> This patch remove the in question 'mapping_end -= cpos', and loops
>>>>>> until the cpos is larger than the mapping_end instead.
>>>>>>
>>>>>> # ./fiemap /ocfs2/test_file 4096 10
>>>>>> start: 4096, length: 10
>>>>>> File /ocfs2/test_file has 1 extents:
>>>>>> #    Logical          Physical         Length           Flags
>>>>>> 0:    0000000000000000 0000000056a01000 0000000006a00000 0000
>>>>>>
>>>>>> Reported-by: David Weber <wb at munzinger.de>
>>>>>> Cc: Mark Fashen <mfasheh at suse.de>
>>>>>> Cc: Joel Becker <jlbec at evilplan.org>
>>>>>> Signed-off-by: Jie Liu <jeff.liu at oracle.com>
>>>>>> ---
>>>>>> fs/ocfs2/extent_map.c |    1 -
>>>>>> 1 file changed, 1 deletion(-)
>>>>>>
>>>>>> diff --git a/fs/ocfs2/extent_map.c b/fs/ocfs2/extent_map.c
>>>>>> index 2487116..8460647 100644
>>>>>> --- a/fs/ocfs2/extent_map.c
>>>>>> +++ b/fs/ocfs2/extent_map.c
>>>>>> @@ -781,7 +781,6 @@ int ocfs2_fiemap(struct inode *inode, struct
>>>>>> fiemap_extent_info *fieinfo, cpos = map_start >>
>>>
>>> osb->s_clustersize_bits;
>>>
>>>>>>    mapping_end = ocfs2_clusters_for_bytes(inode->i_sb,
>>>>>>    
>>>>>>                           map_start + map_len);
>>>>>>
>>>>>> -    mapping_end -= cpos;
>>>>>>
>>>>>>    is_last = 0;
>>>>>>    while (cpos < mapping_end && !is_last) {
>>>>>>    
>>>>>>        u32 fe_flags;
>>>>>>>
>>>>>>> We're running linux-3.11-rc4 plus the following patches:
>>>>>>> [PATCH V2] ocfs2: update inode size after zeroed the hole
>>>>>>> [PATCH RESEND] ocfs2: fix NULL pointer dereference in
>>>>>>> ocfs2_duplicate_clusters_by_page
>>>>>>> NULL pointer dereference at    ocfs2_dir_foreach_blk_id
>>>>>>> [patch v3] ocfs2: ocfs2: fix recent memory corruption bug
>>>>>>>
>>>>>>> o2info --volinfo  /dev/drbd0
>>>>>>>
>>>>>>>       Label: kvm-images
>>>>>>>       
>>>>>>>        UUID: BE7C101466AD4F2196A849C7A6031263
>>>>>>>  
>>>>>>>  Block Size: 4096
>>>>>>>
>>>>>>> Cluster Size: 1048576
>>>>>>>
>>>>>>>  Node Slots: 8
>>>>>>>  
>>>>>>>    Features: backup-super strict-journal-super sparse
>>>
>>> extended-slotmap
>>>
>>>>>>>    Features: inline-data xattr indexed-dirs refcount discontig-bg
>>>>>>>    unwritten
>>>>>>>
>>>>>>> Thanks in advance!
>>>>>>>
>>>>>>> Cheers,
>>>>>>> David
>>>>>>>
>>>>>>>
>>>>>>> [1]
>>>
>>> http://git.qemu.org/?p=qemu.git;a=blob;f=block/raw-posix.c;h=ba721d3f5bd
>>>
>>>>>>> 9
>>>>>>> 8a6b62791c2e20dbf2894021ad76;hb=HEAD#l1087
>>>>>>>
>>>>>>> [2]
>>>
>>> http://smackerelofopinion.blogspot.de/2010/01/using-fiemap-ioctl-to-get->
>>>
>>>>>> f
>>>>>>
>>>>>>> ile-extents.html
>>>>>>>
>>>>>>> [3] https://gist.github.com/anonymous/6172331
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Ocfs2-devel mailing list
>>>>>>> Ocfs2-devel at oss.oracle.com
>>>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>>>>
>>>>> _______________________________________________
>>>>> Ocfs2-devel mailing list
>>>>> Ocfs2-devel at oss.oracle.com
>>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 





More information about the Ocfs2-devel mailing list