[Ocfs2-tools-devel] [PATCH 5/6] Resend: Add unwritten extent support in ocfs2-tools, take 2

Mark Fasheh mark.fasheh at oracle.com
Sat Sep 29 16:04:24 PDT 2007


On Sat, Sep 29, 2007 at 09:34:23AM +0800, tao.ma wrote:
> Mark Fasheh wrote:
> >On Wed, Sep 26, 2007 at 06:50:16PM -0700, Joel Becker wrote:
> >  
> >>On Thu, Sep 27, 2007 at 09:40:21AM +0800, tao.ma wrote:
> >>    
> >>>>	Oho!  We have a problem.  In ocfs2_read_whole_file, we have a
> >>>>reader function (read_whole_func) that doesn't check unwritten extents.
> >>>>That needs to be added to patch #6.  Of course, how to tell the
> >>>>function?
> >>>>	Also, extras/set_random_bits.c uses it.  I think that the
> >>>>fsck.ocfs2 users are safe (symlinks and directories).
> >>>>  
> >>>>        
> >>>Yes, that is what I worry about. the callback has no idea of whether the 
> >>>block is marked as unwritten. So maybe as I have said previously, add a 
> >>>new parameter in callback's definition that this block exists in an 
> >>>unwritten extent?
> >>>Do you have any suggestion or just agree with me? :)
> >>>      
> >>	If it was just read, we could easily pass a flag and expect the
> >>callback to do the right thing.  However, write is much harder.
> >>    
> >
> >Passing a flags field is probably the right solution for reads.
> >
> >Block iteration in general is a very bad interface for doing data writes on
> >ocfs2 though. There's a reason why we moved away from get_block based 
> >writes
> >in kernel. In all fairness, it made no difference until we started
> >supporting sparse files.
> >
> >ocfs2_extent_iterate_inode() doesn't handle holes either. And since it 
> >looks
> >like ocfs2_block_iterate_inode() is based on it, I'll assume the same 
> >there.
> >Hopefully all the read code is smart enough to at least put zero's where a
> >hole or unwritten extent is.
> >
> >
> >  
> >>	I don't quite know the solution.  The rule is, if you have an
> >>unwritten extent, you must zero the parts you don't actually write.  But
> >>block_iterate() is block-based.  It doesn't know from extents.  I think
> >>that block_iterate() will have to know about it.  Somehow know that a
> >>block was written to by the callback and thus zero around it.  Mark, any
> >>thoughts?
> >>    
> >
> >Disallow hole-filling or writes to unwritten regions via those functions?
> >
> >Honestly, there's not much else unless we teach them all how to mark an
> >extent written or allocate clusters - and then they're not exactly block
> >iteration functions any more ;)
> >
> >Imho we should have code which wants to read and write data to regular 
> >files
> >just use ocfs2_file_read() and ocfs2_file_write(). There's still a place 
> >for
> >block or extent iteration, but I think it's no longer suitable for basic
> >file data i/o.
> >	--Mark
> >  
> OK, so let me generalize the consensus we make:
> 1. Add a new paremater to ocfs2_block_iterate like function, so that 
> return zero if they need to read.

That's the concensus so far. However, I just realized one more thing
yesterday. block_iterate reads don't even work for all normal files - if the
file has inline data, you don't have any data blocks or extents to iterate.

As part of the ocfs2-tools inline data patches, the remaining readers of
regular file data will have to stop using block/extent iterate. The API can
stay around for directories that don't have inline data, but that's about
it.

But I suppose being able to get extent flags is generally useful. Actually,
fsck.ocfs2 can be checking to make sure that directories don't have the
unwritten flag set.


> 2. As for write, the user should really call ocfs2_file_write instead of 
> these iterate things. ;)

Yes, absolutely. And I'll add that, given inline-data the user should really
call ocfs2_file_read for reads of regular files.


> 3. As for extent record iteration, we really should not touch the flag 
> to avoid splitting or merging. But we should have the ability the modify 
> the cluster length like ocfs2_truncate. And it would work already with 
> current code I think.

That sounds fine - and yeah, if we're just modifying length of the rightmost
extent record, that should work fine.
	--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh at oracle.com



More information about the Ocfs2-tools-devel mailing list