[Ocfs2-devel] New update for sync in reflink. [Re: some thoughts about data integrity test for reflink.]
Tao Ma
tao.ma at oracle.com
Sat Aug 8 07:28:34 PDT 2009
Joel Becker wrote:
> [I've added ocfs2-devel to the Cc list. Please copy the list, otherwise
> we can't get Mark's thoughts, and he knows some of the code better
> than I do. ]
>
> On Fri, Aug 07, 2009 at 11:23:52PM +0800, Tao Ma wrote:
>
>> Joel Becker wrote:
>>
>>>> I called ocfs2_sync_inode before reflink to flush the page
>>>> cache. So I think we should get all B's and C's are missed.
>>>> The reason I call ocfs2_sync_inode is the following test case:
>>>> 1. Write a file with all A's.
>>>> 2. reflink A to B.
>>>> 3. read B.
>>>> If the page cache of A isn't synced to the disk, we may get the
>>>> garbage from reading of B.
>>>>
>>> We won't get garbage. We get 'A's, which is a valid state of
>>> the file. Whether it's valid in the face of relink is what we're trying
>>> to decide here.
>>>
>> We don't get 'A' here actually and we would get garbage. Strange,
>> uh? I just added some printk and it seems that while reading file B,
>> the buffer_head is created
>> and read from the disk. I used to think that while we write file A,
>> we will create buffer_head. And reading file B will use this
>> buffer_head.
>> But that is not the case actually. The only right sequence is that
>> we add a sync between step 1 and 2.
>>
>
> Um, what?
> Oh, we do the check for PageDirty() in CoW, but not in straight
> read...
>
So what do you mean here? I need to add another patch for straight read
check because of reflink?
I am lack of this type of technique. So please give me some tips. Thanks.
>
>>> Now, whether we want a reflink to force a sync or not in the
>>> writeback case...on the one hand, people will probably expect write();
>>> reflink(); read() to work. On the other hand, they really should
>>> write(); fsync(); reflink(); if they care. We're already dealing
>>> with this regarding rename(), and its ugly. Force the fsync() for them,
>>> and we help the naive caller but penalize (performance-wise) the
>>> intelligent caller.
>>>
>> I have checked my mount mode, it is data=ordered, so we also meet
>> with this issue in ordered mode.
>>
>
> Yes, the original discussion was concerning the page cache
> contents during CoW. You've coded that up correctly. Here we're
> talking about simple read.
>
>
>>> I really lean towards just requiring the caller to deal with it.
>>> It's what unix-like systems do for all other operations. I poked Chris,
>>> and he agrees. So a caller if they knew they had such a situation, does
>>> 'fsync(); reflink();'.
>>>
>> So a user always need to do fsync before he do reflink. Is it OK? If
>> yes, I will ask tristan to update his test case.
>>
>
> Well... I would normally say yes, because stale vs not-stale is
> one thing. But complete garbage from before the first write is not OK.
>
No problem, I will ask tristan to do it. And we may also need to add
some explanations about it when we finally release reflink.
>
>>>> I think it is OK if I call ocfs2_sync_inode before reflink?(See
>>>> my git tree for details).
>>>>
>>> Calling it unconditionally is just a performance disaster
>>> waiting to happen. Even if we want to require reflink() to happen with
>>> data flushed, you don't want to call ocfs2_sync_inode() in the ordered
>>> case. You want the ordered journal to handle it.
>>>
>> Since it doesn't work in order mode which is more common for ocfs2
>> usage, do you change your mind here?
>>
>
> I do not. Having to call ocfs2_sync_inode() for every reflink
> is a disaster. We need to think about this. Basically, how can the
> reflink target notice it has to either sync the source or perhaps copy
> the data over?
>
OK, I agree with you for ocfs2_sync_inode part. I am just a little
worried about the above case I mentioned, so if we can
resolve it gracefully, I would be very glad about it.
Regards,
Tao
More information about the Ocfs2-devel
mailing list