[Ocfs2-devel] New update for sync in reflink. [Re: some thoughts about data integrity test for reflink.]

Joel Becker Joel.Becker at oracle.com
Fri Aug 7 17:10:05 PDT 2009


[I've added ocfs2-devel to the Cc list.  Please copy the list, otherwise
  we can't get Mark's thoughts, and he knows some of the code better
  than I do. ]

On Fri, Aug 07, 2009 at 11:23:52PM +0800, Tao Ma wrote:
> Joel Becker wrote:
> >>I called ocfs2_sync_inode before reflink to flush the page
> >>cache.  So I  think we should get all B's and C's are missed.
> >>The reason I call ocfs2_sync_inode is the following test case:
> >>1. Write a file with all A's.
> >>2. reflink A to B.
> >>3. read B.
> >>If the page cache of A isn't synced to the disk, we may get the
> >>garbage  from reading of B.
> >
> >	We won't get garbage.  We get 'A's, which is a valid state of
> >the file.  Whether it's valid in the face of relink is what we're trying
> >to decide here.
> We don't get 'A' here actually and we would get garbage. Strange,
> uh? I just added some printk and it seems that while reading file B,
> the buffer_head is created
> and read from the disk. I used to think that while we write file A,
> we will create buffer_head. And reading file B will use this
> buffer_head.
> But that is not the case actually. The only right sequence is that
> we add a sync between step 1 and 2.

	Um, what?
	Oh, we do the check for PageDirty() in CoW, but not in straight
read...

> >	Now, whether we want a reflink to force a sync or not in the
> >writeback case...on the one hand, people will probably expect write();
> >reflink(); read() to work.  On the other hand, they really should
> >write(); fsync(); reflink(); if they care.  We're already dealing
> >with this regarding rename(), and its ugly.  Force the fsync() for them,
> >and we help the naive caller but penalize (performance-wise) the
> >intelligent caller.
> I have checked my mount mode, it is data=ordered, so we also meet
> with this issue in ordered mode.

	Yes, the original discussion was concerning the page cache
contents during CoW.  You've coded that up correctly.  Here we're
talking about simple read.

> >	I really lean towards just requiring the caller to deal with it.
> >It's what unix-like systems do for all other operations.  I poked Chris,
> >and he agrees.  So a caller if they knew they had such a situation, does
> >'fsync(); reflink();'.
> So a user always need to do fsync before he do reflink. Is it OK? If
> yes, I will ask tristan to update his test case.

	Well... I would normally say yes, because stale vs not-stale is
one thing.  But complete garbage from before the first write is not OK.

> >>I think it is OK if I call ocfs2_sync_inode before reflink?(See
> >>my git  tree for details).
> >
> >	Calling it unconditionally is just a performance disaster
> >waiting to happen.  Even if we want to require reflink() to happen with
> >data flushed, you don't want to call ocfs2_sync_inode() in the ordered
> >case.  You want the ordered journal to handle it.
> Since it doesn't work in order mode which is more common for ocfs2
> usage, do you change your mind here?

	I do not.  Having to call ocfs2_sync_inode() for every reflink
is a disaster.  We need to think about this.  Basically, how can the
reflink target notice it has to either sync the source or perhaps copy
the data over?

Joel

-- 

Life's Little Instruction Book #43

	"Never give up on somebody.  Miracles happen every day."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127



More information about the Ocfs2-devel mailing list