[Ocfs2-tools-devel] [PATCH 5/6] Resend: Add unwritten extent support in ocfs2-tools, take 2

Sat Sep 29 23:05:21 PDT 2007

Mark Fasheh wrote:
> On Sat, Sep 29, 2007 at 09:34:23AM +0800, tao.ma wrote:
>   
>> Mark Fasheh wrote:
>>     
>>> On Wed, Sep 26, 2007 at 06:50:16PM -0700, Joel Becker wrote:
>>>  
>>>       
>>>> On Thu, Sep 27, 2007 at 09:40:21AM +0800, tao.ma wrote:
>>>>    
>>>>         
>>>>>> 	Oho!  We have a problem.  In ocfs2_read_whole_file, we have a
>>>>>> reader function (read_whole_func) that doesn't check unwritten extents.
>>>>>> That needs to be added to patch #6.  Of course, how to tell the
>>>>>> function?
>>>>>> 	Also, extras/set_random_bits.c uses it.  I think that the
>>>>>> fsck.ocfs2 users are safe (symlinks and directories).
>>>>>>  
>>>>>>        
>>>>>>             
>>>>> Yes, that is what I worry about. the callback has no idea of whether the 
>>>>> block is marked as unwritten. So maybe as I have said previously, add a 
>>>>> new parameter in callback's definition that this block exists in an 
>>>>> unwritten extent?
>>>>> Do you have any suggestion or just agree with me? :)
>>>>>      
>>>>>           
>>>> 	If it was just read, we could easily pass a flag and expect the
>>>> callback to do the right thing.  However, write is much harder.
>>>>    
>>>>         
>>> Passing a flags field is probably the right solution for reads.
>>>
>>> Block iteration in general is a very bad interface for doing data writes on
>>> ocfs2 though. There's a reason why we moved away from get_block based 
>>> writes
>>> in kernel. In all fairness, it made no difference until we started
>>> supporting sparse files.
>>>
>>> ocfs2_extent_iterate_inode() doesn't handle holes either. And since it 
>>> looks
>>> like ocfs2_block_iterate_inode() is based on it, I'll assume the same 
>>> there.
>>> Hopefully all the read code is smart enough to at least put zero's where a
>>> hole or unwritten extent is.
>>>
>>>
>>>  
>>>       
>>>> 	I don't quite know the solution.  The rule is, if you have an
>>>> unwritten extent, you must zero the parts you don't actually write.  But
>>>> block_iterate() is block-based.  It doesn't know from extents.  I think
>>>> that block_iterate() will have to know about it.  Somehow know that a
>>>> block was written to by the callback and thus zero around it.  Mark, any
>>>> thoughts?
>>>>    
>>>>         
>>> Disallow hole-filling or writes to unwritten regions via those functions?
>>>
>>> Honestly, there's not much else unless we teach them all how to mark an
>>> extent written or allocate clusters - and then they're not exactly block
>>> iteration functions any more ;)
>>>
>>> Imho we should have code which wants to read and write data to regular 
>>> files
>>> just use ocfs2_file_read() and ocfs2_file_write(). There's still a place 
>>> for
>>> block or extent iteration, but I think it's no longer suitable for basic
>>> file data i/o.
>>> 	--Mark
>>>  
>>>       
>> OK, so let me generalize the consensus we make:
>> 1. Add a new paremater to ocfs2_block_iterate like function, so that 
>> return zero if they need to read.
>>     
>
> That's the concensus so far. However, I just realized one more thing
> yesterday. block_iterate reads don't even work for all normal files - if the
> file has inline data, you don't have any data blocks or extents to iterate.
>
> As part of the ocfs2-tools inline data patches, the remaining readers of
> regular file data will have to stop using block/extent iterate. The API can
> stay around for directories that don't have inline data, but that's about
> it.
>
> But I suppose being able to get extent flags is generally useful. Actually,
> fsck.ocfs2 can be checking to make sure that directories don't have the
> unwritten flag set.
>
>
>   
>> 2. As for write, the user should really call ocfs2_file_write instead of 
>> these iterate things. ;)
>>     
>
> Yes, absolutely. And I'll add that, given inline-data the user should really
> call ocfs2_file_read for reads of regular files.
>
>
>   
>> 3. As for extent record iteration, we really should not touch the flag 
>> to avoid splitting or merging. But we should have the ability the modify 
>> the cluster length like ocfs2_truncate. And it would work already with 
>> current code I think.
>>     
>
> That sounds fine - and yeah, if we're just modifying length of the rightmost
> extent record, that should work fine.
>   
OK, I will generate a separate patch for this issue.
And after you review the newest patch[5/6], I will regenerate all the 
patches. :)