[Btrfs-users] Thoughts about filesystem undo

Wed Sep 26 14:55:00 PDT 2007

Hi Chris, thanks for replying.

Chris Mason wrote:
> On Tue, 2007-09-25 at 23:45 +0100, Phil Endecott wrote:
>> Dear Btrfs people,
>> 
>> I saw Chris' Btrfs talk at LinuxConf.EU a few weeks ago and have since 
>> been thinking about how I would like to use this great code once you 
>> have done all the hard work :-)
>> 
>> Fine-grain filesystem undo, thanks to cheap snapshots, is what I'm 
>> thinking about.  The more I consider it the more useful I believe it 
>> will be
[snip]
>> So I was wondering if you have thought about how this could be made to 
>> work, from the user's (or application developer's) viewpoint rather 
>> than in terms of the filesystem implementation.  Certainly, more than 
>> just "snapshot create" and "snapshot delete" commands are needed.
>> 
>> One idea is to automatically take a snapshot when each processes 
>> starts, and to keep it until its parent process terminates.  This means 
>> that from the command line I can rollback to between any commands in 
>> that shell's history.  Perhaps applications that suffer an error could 
>> choose to revert all their changes on termination.
>
> There are a lot of different factors in play here.  First, once a new
> snapshot is created, additional COW runs are required for any tree
> metadata related to the snapshot.

Sure.  Like many things, the user will want to weigh the benefits 
against the costs.  But I think that the costs are now tractable, so 
it's worth considering what the benefits could be.

> Picture a directory where process A and process B are both writing.

Hmmm, the complex case.  I'm not even sure what should happen in the 
easy case yet.  But anyway -

> Process A decides it is time to rollback some changes, but what do we do
> with process B?

Was it actually accessing the same files, or just files in the same 
directory?  Were both processes writing, or just one of them?  In a lot 
of cases, if two processes are writing to the same file, the user has 
made a mistake and something bad is going to happen; so any behaviour 
would be better than the current situation.

But let's back off to the simpler case without conflicts.  A process 
can have at least four possible kinds of isolation from other 
processes.  It (and its child processes) always sees data that it (and 
its child processes)  have written, but it may or may not see data that 
is written by other processes since it started.  And its writes may be 
visible to other processes as they occur, or they may become visible 
atomically when it terminates.  At least two combinations are certainly useful:

- Most of the shell scripts that I write implicitly assume that the 
files that they read don't change under their feet, and that nothing 
tries to read their output until it is entirely written.  They also 
assume that they won't be interrupted.

- An application like a word processor is long-running, and the user 
expects that it will see files written by other applications, and that 
saves will appear in the filesystem.

Since we get the second behaviour by default, I imagine a wrapper 
program - let's call it 'atomic' - that implements the first behaviour:

atomic(prog,args) {
   cp / snapshot
   chroot snapshot {
     exec(prog,args)
   }
   if (status!=ok) {
     rm snapshot
     exit(status)
   }
   / = merge(/,shapshot);
}

Maybe something like a setuid bit could indicate that a particular 
executable wants this behaviour.  Or maybe it would be best added to a 
shell (a bit like 'set -e').  Or something.

merge() is the thing that doesn't exist, and the difficulty is what 
should happen if it finds a conflict.  Of course lots of different 
behaviours can be justified in different situations.  Sometimes, the 
snapshot should be abandoned; the user will retry the command.  
Sometimes, it might be best to save it where the user can access it, 
e.g. if I wget some huge file, but accidentally do something under its 
feet, it might be good to get an error message and find the file in 
/tmp.  In other cases, the version from the snapshot should replace the 
conflicted version.  But how can we specify the required behaviour in 
each case?

Regards,

Phil.