[Btrfs-devel] Re: Initial Planning document for multiple device
support
Chris Mason
chris.mason at oracle.com
Wed Jan 23 04:50:24 PST 2008
On Wednesday 23 January 2008, Andi Kleen wrote:
> Chris Mason <chris.mason at oracle.com> writes:
>
> Just commenting on something that tripped me while reading
> the document.
>
> >If Btrfs were to rely on device mapper or MD for mirroring, it would
> >not be able to resolve checksum failures by checking the mirrored
> >copy. The lower layers don't know the checksum or granularity of the
> >filesystem blocks, and so they are not able to verify the data they
> >return.
>
> I cannot imagine it would be that difficult to add a new READ_OTHER_COPY
> io operation that would cause MD/LVM/... to return the other copy
> in a mirror set.
This is something SGI recently proposed, and it is a very good idea I think.
It also makes sense for hooks between MD and the FS to figure out which
blocks are in use during a rebuild, and for the FS to tell LVM when blocks
are freed to help make snapshots more efficient.
>
> Even without btrfs that might be even generally useful for other
> applications that do some checking on their files.
>
> e.g. I could well imagine a new system call to trigger this on the
> page cache level.
>
> There might be other reasons to reinvent another storage manager
> of course. Just that one above doesn't seem to be very convincing.
> I admit I haven't thought too deeply about the other issues you
> raise in the document.
The key problem that requires most of this infrastructure is mirroring
metadata on a single spindle. Chunks aren't required to solve it, but they
do add flexibility to do lots of other things. For example, relocating hot
blocks on to the SSD portion of a combined SSD/spindle drive, or writing to
the SSD when on battery and then transferring in bulk to the spindle.
The chunk code is basically a storage layer with three or four hooks into the
FS. Once I have it working, I'll take a hard look at pushing it down into DM
where it can be used for other things.
-chris
More information about the Btrfs-devel
mailing list