[Ocfs2-tools-devel] [PATCH 2/4] defrag.ocfs2: Pass 1: Defrag individual files and directories

Goldwyn Rodrigues rgoldwyn at gmail.com
Mon Jul 19 02:56:32 PDT 2010


Hi Joel,

On Sun, Jul 18, 2010 at 12:11 PM, Joel Becker <Joel.Becker at oracle.com> wrote:
> On Tue, May 11, 2010 at 11:02:34PM -0500, Goldwyn Rodrigues wrote:
>> Defragging directory -
>> Allocate an extent, and copy dirents to the new extent, skipping
>> holes and empty dirents. For each dirent, the dirent length
>> is recalculated to optimize on space.
>
>        On to defragging the directory.
>
>> +static int copy_dirents(ocfs2_filesys *fs,
>> +             struct ocfs2_extent_rec *rec,
>> +             int tree_depth, uint32_t ccount, uint64_t ref_blkno,
>> +             int ref_recno, void *private)
>> +{
>
>        To be honest, I'd much rather see you use ocfs2_link() than
> hand-copy the dirents.  We already have audited code for editing these
> things.  I have a proposed way to code it that I will outline below.
>
>> +errcode_t defrag_dir(struct defrag_state *dst, struct ocfs2_dinode *di)
>> +{
>> +     struct defrag_dir_context dc;
>> +     uint64_t tmpblkno;
>> +     errcode_t ret;
>> +     int offset = 0, bs = dst->dst_fs->fs_blocksize;
>> +
>> +     /* XXX: Ignore refcounted dir for now */
>> +     if (di->i_dyn_features & (OCFS2_INLINE_DATA_FL|OCFS2_HAS_REFCOUNT_FL))
>> +             return 0;
>
>        Directories can't be refcounted.  That would be a corrupt
> filesystem.  Just check for inline data.
>
>> +     /*Initialize dc */
>> +     memset(&dc, 0, sizeof(struct defrag_dir_context));
>> +     dc.dst = dst;
>> +     dc.prev_offset = -1;
>> +     dc.old_inode = di;
>> +
>> +     ret = ocfs2_malloc_block(dst->dst_fs->fs_io, &dc.w_buf);
>> +     if (ret) {
>> +             com_err(whoami, ret, "while allocating memory\n");
>> +             goto out;
>> +     }
>> +     memset(dc.w_buf, 0, bs);
>> +
>> +     ret = ocfs2_new_inode(dst->dst_fs, &tmpblkno, di->i_mode);
>> +     if (ret) {
>> +             com_err(whoami, ret, "while creating inode\n");
>> +             goto out;
>> +     }
>> +
>> +     ret = ocfs2_read_cached_inode(dst->dst_fs, tmpblkno, &dc.new_inode);
>> +     if (ret) {
>> +             com_err(whoami, ret, "while reading cached inode\n");
>> +             goto out;
>> +     }
>> +     /* XXX Hackish - reversing what ocfs2_init_inode did to the cached
>> +        inode */
>> +     dc.new_inode->ci_inode->i_dyn_features &= ~OCFS2_INLINE_DATA_FL;
>> +     ocfs2_dinode_new_extent_list(dst->dst_fs, dc.new_inode->ci_inode);
>
>        This isn't actually hackish.  It's exactly what you should do.
> However, you might want to write out the inode at this point, so other
> functions can read it.
>        Ok, here's how I think you should copy the dirent data.  I think
> you should do a two-pass loop:
>
> First pass:
>        1) Walk the directory.
>          a) Save a list of all the (name, blockno) you find.
>          b) Save the total of all the space needed.  This you can get
>             by adding up DIR_REC_LEN(dirent->name_len) for each
>             directory entry.  Remember to add the space needed for
>             trailers if they are enabled.  When done, you'll be able to
>             calculate the number of dirblocks you minimally need.  Add
>             a few more dirblocks (10?) for slop if your directory is
>             large.
>
> Now allocate your new defrag dir and grow it to have enough dirblocks.
> You can then call ocfs2_init_dir().
>
> Second pass:
>        1) For each name in the list you saved off
>          a) link that name into the new directory with ocfs2_link()
>          b) remove it from the old directory with ocfs2_unlink()
>
> Now you've got the stuff saved without having to code the dir copy.  The
> old directory is now empty, so you can truncate it.  Move the extents
> back from the new directory, and fix up the '.' record.
>        "But Joel," you ask, "Won't it be really slow if every
> ocfs2_link() and ocfs2_unlink() call writes the changes to disk?"  Yes,
> it would, except that I think defrag.ocfs2 should run with
> OCFS2_FLAG_BUFFERED.  There's no reason to run in O_DIRECT.  Let the
> page cache handle your performance.
>        File defrag review tomorrow or so.
>


This is a good idea, however the disadvantages I see are that this
would require more memory to keep all dirents in memory, so for a
directory with million entries this might be painful.

However, the advantage of this approach is you can:
+ sort the entries so the readdir+stat calls are faster (ls -l).
+ you can precisely find out how big the directory would be and then
allocate accordingly so you dont further defrag.
+ not bother about rebuilding the index tree

>
> --
>
> "Ninety feet between bases is perhaps as close as man has ever come
>  to perfection."
>        - Red Smith
>


I like the Life's Little Instruction Book better :)

-- 
Goldwyn



More information about the Ocfs2-tools-devel mailing list