[Ocfs2-devel] [PATCH] Treat writes as new when holes span across page boundaries
Goldwyn Rodrigues
rgoldwyn at gmail.com
Wed Feb 23 13:35:36 PST 2011
Hi Joel,
On Wed, Feb 23, 2011 at 3:17 PM, Joel Becker <jlbec at evilplan.org> wrote:
> On Wed, Feb 23, 2011 at 02:54:19PM -0600, Goldwyn Rodrigues wrote:
>> On Wed, Feb 23, 2011 at 1:13 PM, Joel Becker <jlbec at evilplan.org> wrote:
>> > I was about to think we were done, but now I'm not so sure. I
>> > have a question: with my change to ocfs2_should_read_block(), but
>> > without your patch, what do you see for corruption? Your pattern
>> > written in the first pass, or complete random garbage?
>>
>> Joel, you are getting confused in the list of parameters to reproduce
>> the bug. I have listed it in the past, but not in a single location so
>> it is difficult to pick up the bits. I will put it together for you.
>
> Goldwyn,
> I'm just not remembering off the top of my head, so I'm asking.
> I'm actually describing the myriad cases we can encounter in this code,
> because any change we make has to be safe for all possible cases.
>
>> 1. blocksize < clustersize
>
> Ok.
>
>> 2. the previous write ended in a page boundary, but not a cluster
>> boundary. eg, clustersize=16K but pos is at the page boundary = 4k
>
> Must it be a page boundary? What about a block boundary? ;-)
I am not sure of this, but I would like to stick with page boundary. I
am too poor to afford a machine with a page size bigger than 4k to
test this ;)
>
>> 3. you wrote something at the page boundary
>
> I think it's more block than page, really. But that is the same
> for 4K blocks and pages.
>
>> 4. there is a hole created in the page from 4k-8k in *between* say 4128-4640
>
> There is no hole here. We have allocation. Even if the user
> doesn't directly write to this location, it is not a hole, because we
> have a cluster there. You just mean that the user didn't write to this
> location (sparse writing). That's just a terminology thing, but it's
> important for other folks understanding you.
Agreed. hole in user-space? ;)
>
>> What I am seeing is pure junk(not 0xbaadfeed), and not what I had
>> written earlier.
>
> Ok. That's what I expected. I wanted to clarify we were not
> reading stale data, but were instead failing to zero memory.
>
>> In retrospect, since we have covered the rest of the bases, we can set
>> new on the condition:
>>
>> i_size == page_offset == pos
>>
>> I interpreted new as new page or new data vs new cluster, and perhaps
>> that is the reason I placed the condition there.
>
> I don't like this. We still haven't figured out where we're
> failing.
What I meant with rest of the bases is the zeroing of pages when pos != i_size.
> The core piece of this, why I'm going back and forth with you
> discussing the issue, is that we never want to paper over a symptom.
> Sure, we know that i_size==page_offset==pos triggers this, but is it the
> only case? If we figure out *why* we're here, when the code is supposed
> to not have this problem, then we can be sure we've closed out all
> exposure.
I agree absolutely, and this is the reason I am in the discussion. All
I am saying from the start is that the (2nd-) page is never zeroed. We
just have to figure out who and where it has to be zeroed. I am not
keen on pushing my solution, but the correct one.
> To wit: If we really only have a problem when i_size is on a
> page boundary, doesn't that sound like an off-by-one error? We've had
> those before. I'm not sure that it is, but shouldn't we be sure about
> it?
>
Yes it is. It is one of those corner cases which gets missed and we
should be sure of it.
--
Goldwyn
More information about the Ocfs2-devel
mailing list