[Ocfs2-devel] [PATCH 1/3] fs: Document the reflink(2) system call.
Boaz Harrosh
bharrosh at panasas.com
Sun May 3 06:08:59 PDT 2009
On 05/03/2009 09:15 AM, Joel Becker wrote:
> int reflink(const char *oldpath, const char *newpath);
>
> The reflink(2) system call creates reference-counted links. It creates
> a new file that shares the data extents of the source file in a
> copy-on-write fashion. Its calling semantics are identical to link(2).
> Once complete, programs see the new file as a completely separate entry.
>
Please forgive my complete Unix jargon novice-ness, but from here it looks like the
name is very wrong, and confusing.
if I put data to link graph then:
[data]<--[hard-link (one or more)]<--[soft-link(zero or more)]
The data is other-wise just there on disk but is un available until
it is linked to a dir-entry, at-least one. The middle hard-link is reference
counted and once all uses are removed data can be garbage collected. Soft links
don't follow on-disk data but follow a dir-entry. So if we have a completely
different on disk data we're still in agreement with the dir-entry.
In the graph above and has explained below. there is no reference counting
going on:
> +- The link count of the source file is unchanged, and the link count of
> + the new file is one.
And and the "link" meaning is very vaguely kept, only half way until the next
write. (If it can be called a link at all being a different inode and cached
twice)
As my first impression when I read the title of the patch, an English reflink
I would imagine is something more to the left of above graph, between hard-link
and soft-link, something like: link to an invisible dir-entry that is gone once
all soft-links to it are gone.
So form my point of view. Call it something different like Copy-On-Write or
COW.
I do understand that there is something very fundamental in my misunderstanding,
but it was not explained below, in fact the below terminology confused me even
more. Please explain?
> Signed-off-by: Joel Becker <joel.becker at oracle.com>
> ---
> Documentation/filesystems/reflink.txt | 129 +++++++++++++++++++++++++++++++++
> Documentation/filesystems/vfs.txt | 4 +
> 2 files changed, 133 insertions(+), 0 deletions(-)
> create mode 100644 Documentation/filesystems/reflink.txt
>
> diff --git a/Documentation/filesystems/reflink.txt b/Documentation/filesystems/reflink.txt
> new file mode 100644
> index 0000000..f3620f0
> --- /dev/null
> +++ b/Documentation/filesystems/reflink.txt
> @@ -0,0 +1,129 @@
> +reflink(2)
> +==========
> +
> +NAME
> +----
> +reflink - make a reference-counted link of a file
> +
> +
> +SYNOPSIS
> +--------
> +#include <unistd.h>
> +
> +int reflink(const char *oldpath, const char *newpath);
> +
> +DESCRIPTION
> +-----------
> +reflink() creates a new reflink (also known as a reference-counted link)
> +to an existing file. This reflink is a new file object that shares the
> +attributes and data extents of the source object in a copy-on-write fashion.
> +
This is exactly my confusion how is the logical jump made from reflink (reference/link)
to copy-on-write. I fail to see any logical connection.
> +An easy way to think of it is that the semantics of the reflink() call
> +are identical to the link(2) system call, but the resulting file object
> +behaves as if it were a copy with identical attributes.
> +
> +Like the link(2) system call, if newpath exists, it will not be overwritten.
> +oldpath must be a regular file. oldpath and newpath must be on the same
> +mounted filesystem.
> +
> +All data extents of the new file must be shared with the source file in
> +a copy-on-write fashion. This includes data extents for extended
> +attributes. If either the source or new files are written to, the
> +changes do not show up in the other file.
> +
> +All file attributes and extended attributes of the new file must
> +identical to the source file with the following exceptions:
> +
> +- The new file must have a new inode number. This allows POSIX
> + programs to treat the source and new files as separate objects. From
> + the view of the POSIX application, the files are distinct. The
> + sharing is invisible outside the filesystem.
> +- The ctime of the source file only changes if the source's metadata
> + must be changed to accommodate the copy-on-write linkage. The ctime of
> + the new file is set to represent its creation.
> +- The mtime of the source file is unmodified, and the mtime of the new file
> + is set identical to the source file. This reflects that the data is
> + unchanged.
> +- The link count of the source file is unchanged, and the link count of
> + the new file is one.
> +
> +RETURN VALUE
> +------------
> +On success, zero is returned. On error, -1 is returned, and errno is
> +set appropriately.
> +
> +ERRORS
> +------
> +EACCES::
> + Write access to the directory containing newpath is denied, or
> + search permission is denied for one of the directories in the
> + path prefix of oldpath or newpath. (See also path_resolution(7).)
> +
> +EEXIST::
> + newpath already exists.
> +
> +EFAULT::
> + oldpath or newpath points outside your accessible address space.
> +
> +EIO::
> + An I/O error occurred.
> +
> +ELOOP::
> + Too many symbolic links were encountered in resolving oldpath or
> + newpath.
> +
> +ENAMETOOLONG::
> + oldpath or newpath was too long.
> +
> +ENOENT::
> + A directory component in oldpath or newpath does not exist or is
> + a dangling symbolic link.
> +
> +ENOMEM::
> + Insufficient kernel memory was available.
> +
> +ENOSPC::
> + The device containing the file has no room for the new directory
> + entry or file object.
> +
> +ENOTDIR::
> + A component used as a directory in oldpath or newpath is not, in
> + fact, a directory.
> +
> +EPERM::
> + oldpath is a directory.
> +
> +EPERM::
> + The file system containing oldpath and newpath does not support
> + the creation of reference-counted links.
> +
> +EROFS::
> + The file is on a read-only file system.
> +
> +EXDEV::
> + oldpath and newpath are not on the same mounted file system.
> + (Linux permits a file system to be mounted at multiple points,
> + but reflink() does not work across different mount points, even if
> + the same file system is mounted on both.)
> +
> +VERSIONS
> +--------
> +reflink() is available on Linux since kernel 2.6.31.
> +
> +CONFORMING TO
> +-------------
> +reflink() is Linux-specific.
> +
> +NOTES
> +-----
> +reflink() deferences symbolic links in the same manner that link(2)
> +does. For precise control over the treatment of symbolic links, see
> +reflinkat().
> +
> +In the case of a crash, the new file must not appear partially complete
> +in the filesystem.
> +
> +SEE ALSO
> +--------
> +ln(1), reflink(1), reflinkat(2), path_resolution(7)
> +
> diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
> index f49eecf..01cd810 100644
> --- a/Documentation/filesystems/vfs.txt
> +++ b/Documentation/filesystems/vfs.txt
> @@ -333,6 +333,7 @@ struct inode_operations {
> ssize_t (*listxattr) (struct dentry *, char *, size_t);
> int (*removexattr) (struct dentry *, const char *);
> void (*truncate_range)(struct inode *, loff_t, loff_t);
> + int (*reflink) (struct dentry *,struct inode *,struct dentry *);
> };
>
> Again, all methods are called without any locks being held, unless
> @@ -431,6 +432,9 @@ otherwise noted.
>
> truncate_range: a method provided by the underlying filesystem to truncate a
> range of blocks , i.e. punch a hole somewhere in a file.
> + reflink: called by the reflink(2) system call. Only required if you want
> + to support reflinks. For further information, see
> + Documentation/filesystems/reflink.txt.
>
>
> The Address Space Object
Please forgive my ignorance, again I would honestly like to understand, and
how else, then to just ask?
Thanks in advance
Boaz
More information about the Ocfs2-devel
mailing list