[Ocfs2-tools-devel] [PATCH 2/4] defrag.ocfs2: Pass 1: Defrag individual files and directories

Mon Jul 19 10:51:55 PDT 2010

On Mon, Jul 19, 2010 at 11:56:32AM +0200, Goldwyn Rodrigues wrote:
> >        Ok, here's how I think you should copy the dirent data.  I think
> > you should do a two-pass loop:
> >
> > First pass:
> >        1) Walk the directory.
> >          a) Save a list of all the (name, blockno) you find.
> >          b) Save the total of all the space needed.  This you can get
> >             by adding up DIR_REC_LEN(dirent->name_len) for each
> >             directory entry.  Remember to add the space needed for
> >             trailers if they are enabled.  When done, you'll be able to
> >             calculate the number of dirblocks you minimally need.  Add
> >             a few more dirblocks (10?) for slop if your directory is
> >             large.
> >
> > Now allocate your new defrag dir and grow it to have enough dirblocks.
> > You can then call ocfs2_init_dir().
> >
> > Second pass:
> >        1) For each name in the list you saved off
> >          a) link that name into the new directory with ocfs2_link()
> >          b) remove it from the old directory with ocfs2_unlink()
> >
> > Now you've got the stuff saved without having to code the dir copy.  The
> > old directory is now empty, so you can truncate it.  Move the extents
> > back from the new directory, and fix up the '.' record.
> >        "But Joel," you ask, "Won't it be really slow if every
> > ocfs2_link() and ocfs2_unlink() call writes the changes to disk?"  Yes,
> > it would, except that I think defrag.ocfs2 should run with
> > OCFS2_FLAG_BUFFERED.  There's no reason to run in O_DIRECT.  Let the
> > page cache handle your performance.
> >        File defrag review tomorrow or so.
> >
> 
> 
> This is a good idea, however the disadvantages I see are that this
> would require more memory to keep all dirents in memory, so for a
> directory with million entries this might be painful.

	It might be, but a million entries is perhaps 256MB of data?
That's not unreasonable on a modern system.  That said, I don't think
you quite have to do this anymore.  See below.

> However, the advantage of this approach is you can:
> + sort the entries so the readdir+stat calls are faster (ls -l).
> + you can precisely find out how big the directory would be and then
> allocate accordingly so you dont further defrag.
> + not bother about rebuilding the index tree

	All good things!
	Btw, I realized a problem with my scheme: what if you crash in
the middle of it?  You lose whatever was in the temporary directory from
the original.
	Instead, I think it should be three-phase.  First walk the
original to calculate your new size.  Second, link all the names into
the temporary directory and then swap the extent lists.  Third, now that
the old extents are on the temporary directory, unlink everything from
them and remove the temporary directory.
	If you're worried about RAM, you can even do this without saving
off the names.  So perhaps you save the names in the first walk, and if
you can save them all, you can do sorting.  However, if you run into
-ENOMEM, you drop your saved names and just defrag them in original
order.

Joel

-- 

Life's Little Instruction Book #226

	"When someone hugs you, let them be the first to let go."

Joel Becker
Consulting Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127