[Ocfs2-devel] copyfile semantics.

Andreas Dilger adilger at sun.com
Tue May 5 14:44:54 PDT 2009


On May 05, 2009  18:46 +0200, Jörn Engel wrote:
> On Tue, 5 May 2009 16:36:29 +0100, Jamie Lokier wrote:
> > What is the advantage of adding the system call for the special case
> > of reflink(), when we choose not to have, say, a copyfile() system
> > call which does what "cp -a" does because doing it in user space is
> > good enough?
> 
> Given an ignorant filesystem, copyfile() will simply do the read/write
> loop in kernelspace.  So either copyfile() is just a fancy name for
> splice()

Sure, except splice() (AFAIK) doesn't allow a splice between two regular
files, only between a pipe and a file.  Maybe it has changed since the
last time I looked.  On high performance filesystems the copy_to_user()
and copy_from_user() can be a major limiting factor on IO performance,
and it is getting more significant because the single-core performance
is not improving at all.  At 1GB/s just a single copy_{to,from}_user
(read or write) will consume 40% of a single core.

If it is possible to use splice() to copy between two regular files then
that is great.  Does anything (e.g. cp) actually use this yet?

> or copyfile() will also have to create a tempfile, rename the
> tempfile when the copy is done and deal with all possible errors.  And
> if the system crashes, who will remove the tempfile on reboot?  Will the
> tempfile have a well-known name, allowing for easy DoS?  Or will it be
> random, causing much fun locating it after reboot.

Maybe I'm missing something, but why do we need a tempfile at all?
I can't imagine that people expect atomic semantics for copyfile(),
any more than they expect atomic sematics for "cp" in the face of a
crash.

> When implemented in the filesystem itself, copyfile() can be quite nice.
> The filesystem can create a temporary inode without visibly exposing it
> to userspace.  It can delete temporary inodes in journal replay after a
> crash.  And depending on the fs design, the read/write loop can be
> replaced with finer-grained reference counting.

I would think that copyfile() is of primary interest when it involves
a network filesystem, so there is no need to ship data to the client
doing the copy at all.  This is possible for NFS and CIFS protocol today,
AFAIK.  The problem with splice is that the filesystem only knows about
->splice_read() and ->splice_write(), it doesn't have any opportunity
to optimize this further (e.g. by sending a "copyfile" RPC, or implementing
a reflink or whatever).

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.




More information about the Ocfs2-devel mailing list