[Ocfs2-devel] [PATCH] Treat writes as new when holes span across page boundaries

Goldwyn Rodrigues rgoldwyn at gmail.com
Wed Feb 23 13:09:22 PST 2011


Hi Joel,

On Wed, Feb 23, 2011 at 1:13 PM, Joel Becker <jlbec at evilplan.org> wrote:
> On Wed, Feb 23, 2011 at 08:43:44AM -0600, Goldwyn Rodrigues wrote:
>> On Wed, Feb 23, 2011 at 3:39 AM, Joel Becker <jlbec at evilplan.org> wrote:
>> >> After the write, i_size=4128.
>> >
>> >        And the block is zeroed out to 8191 (or at least it should be)
>> > by.
>> >
>>
>> by...?
>
>        Haha, where'd the rest of that sentence go?  By the
> page prepare code (prepare_write() in older linux, write_begin() in
> newer linux).
>
>> So, who should zero the page? I assume map_pages
>
>        ocfs2_map_page_blocks() explicitly describes itself as not
> handling the zeroing.  Let's trust that for now.
>
>> >        Think about your first patch.  It zeros from i_size to pos.  But
>>
>> Actually, it zeroed from i_size to EOC. But again I think that might
>> be incorrect because of the commits you mentioned.
>
>        Sure.  But it shouldn't be necessary.
>
>> >        Your last patch, where you force 'new', isn't quite right.  The
>> > cluster boundaries passed to map_pages aren't right, and the later
>> > zeroing could be wrong (zeroing earlier parts of the file that contained
>> > data).  At least, that's how I read it.  But I think you're in the right
>> > place to start.
>>
>> I still don't understand why? I am not considering clusters at all.
>> The flag is set to new if -
>> i_size <= page_offset <= pos
>
>        Thinking about it, I think you're right about this part of its
> safety.  My big concern is that 'new' means "I have a whole empty
> cluster" and that might not be true.
>        It's starting to look like your code is even closer to correct,
> but might not be located exactly where it needs to be.  Let's look at
> what should_zero means.
>        In ocfs2_write_cluster(), it means the calling functions have an
> entire new or unwritten cluster.  That's why we take v_blkno as cpos
> when should_zero is true.  We know that the caller has set us up with an
> entire cluster and we have to zero the front or all of it.  That's great
> when we call down to ocfs2_prepare_page_for_write(), as we are able to
> trust we can zero anything we see with impunity.  That's why we zero
> whole ranges (cluster_start..cluster_end).  Remember here that a write
> context can have multiple pages (to cover the front of a large cluster
> that's just been allocated) or multiple clusters per page (16K/64K page
> size).  should_zero (from c_needs_zero) tells us that the entire cluster
> we're looking at is newly active.
>        Down in ocfs2_map_page_blocks(), we must not venture outside
> from/to, as other code (not the write path) assumes it will not pull in
> blocks that it doesn't need.  So that's not a place to change.
>        What about the loop at the bottom of ocfs2_write_cluster()?  The
> one that calls ocfs2_prepare_page_for_write()?  It can walk multiple
> pages, some of which may be the front of a newly allocated cluster
> (Imagine this write is allocating a 1MB cluster at the end of the file,
> but pos is 512K inside the new cluster.  This code handles zeroing all
> the pages between the start of the cluster and pos).  I suppose we could
> check here, but ocfs2_prepare_page_for_write() has logic based on target
> vs non-target page, and we don't want to override it.
>        So we do want to make a change inside
> ocfs2_prepare_page_for_write().  Let's look at your change.  You trigger
> 'new' even if we're not the target page.  This means the BUG_ON() in the
> else{} can never fail for the extend I described in the last paragraph.
> The calling code should have made sure we are correctly passing
> should_zero to all non-target pages.  Having new overridden for them is
> wrong.  So maybe we move your check inside the target_page if.  It only
> really matters for the target page.
>        But wait!  Check out block_write_full_page_endio() in
> fs/buffer.c!  Especially this part:
>
>        zero_user_segment(page, offset, PAGE_CACHE_SIZE);
>

Are you sure that block_write_full_page_endio() will be called before
the next system call on this file comes in and moves I_size beyond end
of this page?


-- 
Goldwyn



More information about the Ocfs2-devel mailing list