[Ocfs2-tools-devel] [PATCH 2/4] defrag.ocfs2: Pass 1: Defrag individual files and directories
Goldwyn Rodrigues
rgoldwyn at gmail.com
Mon Jul 19 02:56:32 PDT 2010
Hi Joel,
On Sun, Jul 18, 2010 at 12:11 PM, Joel Becker <Joel.Becker at oracle.com> wrote:
> On Tue, May 11, 2010 at 11:02:34PM -0500, Goldwyn Rodrigues wrote:
>> Defragging directory -
>> Allocate an extent, and copy dirents to the new extent, skipping
>> holes and empty dirents. For each dirent, the dirent length
>> is recalculated to optimize on space.
>
> On to defragging the directory.
>
>> +static int copy_dirents(ocfs2_filesys *fs,
>> + struct ocfs2_extent_rec *rec,
>> + int tree_depth, uint32_t ccount, uint64_t ref_blkno,
>> + int ref_recno, void *private)
>> +{
>
> To be honest, I'd much rather see you use ocfs2_link() than
> hand-copy the dirents. We already have audited code for editing these
> things. I have a proposed way to code it that I will outline below.
>
>> +errcode_t defrag_dir(struct defrag_state *dst, struct ocfs2_dinode *di)
>> +{
>> + struct defrag_dir_context dc;
>> + uint64_t tmpblkno;
>> + errcode_t ret;
>> + int offset = 0, bs = dst->dst_fs->fs_blocksize;
>> +
>> + /* XXX: Ignore refcounted dir for now */
>> + if (di->i_dyn_features & (OCFS2_INLINE_DATA_FL|OCFS2_HAS_REFCOUNT_FL))
>> + return 0;
>
> Directories can't be refcounted. That would be a corrupt
> filesystem. Just check for inline data.
>
>> + /*Initialize dc */
>> + memset(&dc, 0, sizeof(struct defrag_dir_context));
>> + dc.dst = dst;
>> + dc.prev_offset = -1;
>> + dc.old_inode = di;
>> +
>> + ret = ocfs2_malloc_block(dst->dst_fs->fs_io, &dc.w_buf);
>> + if (ret) {
>> + com_err(whoami, ret, "while allocating memory\n");
>> + goto out;
>> + }
>> + memset(dc.w_buf, 0, bs);
>> +
>> + ret = ocfs2_new_inode(dst->dst_fs, &tmpblkno, di->i_mode);
>> + if (ret) {
>> + com_err(whoami, ret, "while creating inode\n");
>> + goto out;
>> + }
>> +
>> + ret = ocfs2_read_cached_inode(dst->dst_fs, tmpblkno, &dc.new_inode);
>> + if (ret) {
>> + com_err(whoami, ret, "while reading cached inode\n");
>> + goto out;
>> + }
>> + /* XXX Hackish - reversing what ocfs2_init_inode did to the cached
>> + inode */
>> + dc.new_inode->ci_inode->i_dyn_features &= ~OCFS2_INLINE_DATA_FL;
>> + ocfs2_dinode_new_extent_list(dst->dst_fs, dc.new_inode->ci_inode);
>
> This isn't actually hackish. It's exactly what you should do.
> However, you might want to write out the inode at this point, so other
> functions can read it.
> Ok, here's how I think you should copy the dirent data. I think
> you should do a two-pass loop:
>
> First pass:
> 1) Walk the directory.
> a) Save a list of all the (name, blockno) you find.
> b) Save the total of all the space needed. This you can get
> by adding up DIR_REC_LEN(dirent->name_len) for each
> directory entry. Remember to add the space needed for
> trailers if they are enabled. When done, you'll be able to
> calculate the number of dirblocks you minimally need. Add
> a few more dirblocks (10?) for slop if your directory is
> large.
>
> Now allocate your new defrag dir and grow it to have enough dirblocks.
> You can then call ocfs2_init_dir().
>
> Second pass:
> 1) For each name in the list you saved off
> a) link that name into the new directory with ocfs2_link()
> b) remove it from the old directory with ocfs2_unlink()
>
> Now you've got the stuff saved without having to code the dir copy. The
> old directory is now empty, so you can truncate it. Move the extents
> back from the new directory, and fix up the '.' record.
> "But Joel," you ask, "Won't it be really slow if every
> ocfs2_link() and ocfs2_unlink() call writes the changes to disk?" Yes,
> it would, except that I think defrag.ocfs2 should run with
> OCFS2_FLAG_BUFFERED. There's no reason to run in O_DIRECT. Let the
> page cache handle your performance.
> File defrag review tomorrow or so.
>
This is a good idea, however the disadvantages I see are that this
would require more memory to keep all dirents in memory, so for a
directory with million entries this might be painful.
However, the advantage of this approach is you can:
+ sort the entries so the readdir+stat calls are faster (ls -l).
+ you can precisely find out how big the directory would be and then
allocate accordingly so you dont further defrag.
+ not bother about rebuilding the index tree
>
> --
>
> "Ninety feet between bases is perhaps as close as man has ever come
> to perfection."
> - Red Smith
>
I like the Life's Little Instruction Book better :)
--
Goldwyn
More information about the Ocfs2-tools-devel
mailing list