[Btrfs-devel] Re: 3 thoughts about important and outstanding features - cont.

Chris Mason chris.mason at oracle.com
Fri Jan 25 06:22:01 PST 2008


On Fri, Jan 25, 2008 at 10:59:05AM +0100, myLC at gmx.net wrote:
> Chris Mason wrote:
>
> > I think you're saying that a copy-on-write duplicate is not
> > sufficient for video editing. The thing to keep in mind is
> > that with btrfs COW is done on an extent basis, and extents
> > can be small (1MB is small for these files). So, you can do
> > the video editing part via cow, slice out the parts you
> > don't want, and you'll get that storage back. It is just a
> > matter of setting the max extent size on the file.
>
> Maybe I'm missing a point here - the COW file is still
> nothing without the original, right?

There are two pieces to this, the first is copy on write, and the second
is that two files can both have a references on the same extent.  It is
the COW that makes the references safe...if either file modifies some 
part of the extent, the results of the modification are written
elsewhere.  So, a little bit of ascii, where the files are the same.
This shows an extent for the first 1GB of the file, which starts at
offset 256MB into the disk.

file1
    \
[ bytes 0...1GB -> disk byte 256MB, offset 0, length 1GB]
    /
file2

Lets say that we want file2 to be the same as file1, but we want
to cut 4k from the middle.  file2 will have 3 extent pointers:

[ bytes 0...512MB          -> disk byte 256MB, offset 0, length 512MB]
[ bytes 512MB...512MB + 4k -> disk byte 2GB, offset 0, length 4k]
[ bytes 512MB + 4k...1GB   -> disk byte 256MB, offset 512MB + 4k,
                              length 512MB - 4k]

file2 has two references on the extent from file1, and file1 has one
reference on the extent.  When file1 goes away, his reference is
removed, but the extent is not freed because 2 references remain.

Depending on how much slicing is done, you might end up with a small
number of references to the 1GB extent, perhaps corresponding to only
a few MB in the final file.  Using smaller extents will reduce the
space wasted because of this.

> If so, then how fast is it when it comes to throwing away
> the original? Provided that the former works reasonably
> fast, then you would have the inserting into/removing from a
> file problem solved (1 MB is indeed becoming small, even for
> embedded systems and certainly for harddrives:-).
> You stated before that it wouldn't be "that fast". I'm
> somewhat curious about that - 'should be somewhat faster
> than copying the whole damn/or even half of the damn
> (say 10 GB) thing...

The operation will still be O(number of extents past the slice/insert).
Every extent pointer in the file past the point where the change is
made needs to be updated.  But, this is much much smaller than the
actual file data.

>
>
> > I don't think you can conclude moving the hibernation file
> > is the cause of the performance problem.
>
> Trust me, it is (and I oughta kick myself for that one;-).
>
>
> > XP probably frees as much file cache as it can before
> > suspend to disk, which means that when you resume you have
> > to seek all over the drive to load files back in...
>
> Nope. XP is rather primitive in (not only) that matter.
> The size of the hibernation file always matches the amount
> of installed RAM. The memory simply gets written into the
> file and read back upon awakening. There is a simple
> progress bar, now indicating that the whole operation takes
> a lot longer than before. After the memory is read back it
> reinitializes a few devices and such (1-3 seconds)...

Well, I won't argue about how XP works ;)  Even on linux the hibernation
file should be the same size as ram, but much less than that is usually
written there.

I'm not debating the physics of hard drives, but the focus right now is
on sufficient knobs to make this easy for the admins to decide for
themselves.  Trying to auto-select inside vs outside on the huge
variety of storage people actually use is doomed to fail.  It wouldn't
be hard to make a small helper program that did IO and made suggestions.

-chris




More information about the Btrfs-devel mailing list