[Ocfs2-devel] [PATCH 2/7] ocfs2: Add a name indexed b-tree to directory inodes

Joel Becker Joel.Becker at oracle.com
Fri Jan 30 16:54:26 PST 2009


On Fri, Jan 30, 2009 at 01:42:28PM -0800, Mark Fasheh wrote:
> This patch makes use of Ocfs2's flexible btree code to add an additional
> tree to directory inodes. The new tree stores an array of small,
> fixed-length records in each leaf block. Each record stores a hash value,
> and pointer to a block in the traditional (unindexed) directory tree where a
> dirent with the given name hash resides. Lookup exclusively uses this tree
> to find dirents, thus providing us with constant time name lookups.
> 
> Some of the hashing code was copied from ext3. Unfortunately, it has lots of
> unfixed checkpatch errors. I left that as-is so that tracking changes would
> be easier.
> 
> Signed-off-by: Mark Fasheh <mfasheh at suse.com>

	Big patch to review.  Whew!  But overall it looks very good.
Comments inline.

> +/*
> + * Read the block at 'phys' which belongs to this directory
> + * inode. This function does no virtual->physical block translation -
> + * what's passed in is assumed to be a valid directory block.
> + */
> +static int ocfs2_read_dir_block_direct(struct inode *dir, u64 phys,
> +				       struct buffer_head **bh)
> +{
> +	int ret;
> +	struct buffer_head *tmp = *bh;
> +
> +	ret = ocfs2_read_block(dir, phys, &tmp, ocfs2_validate_dir_block);
> +	if (ret) {
> +		mlog_errno(ret);
> +		goto out;
> +	}
> +
> +	ret = ocfs2_check_dir_trailer(dir, tmp);
> +	if (ret) {
> +		if (!*bh)
> +			brelse(tmp);
> +		mlog_errno(ret);
> +		goto out;
> +	}
> +
> +	if (!ret && !*bh)
> +		*bh = tmp;
> +out:
> +	return ret;
> +}

	I think you need the same guard check as ocfs2_read_dir_block():

  	if (!(flags & OCFS2_BH_READAHEAD) &&
  	    ocfs2_dir_has_trailer(inode)) {

because otherwise ocfs2_check_dir_trailer() will run unconditionally.  I
realize that the current code doesn't call ocfs2_read_dir_block_direct()
except in the indexed case, but that's just for now.
	Alternatively, you could BUG_ON(!ocfs2_dir_has_trailer()).

> @@ -1184,6 +1849,8 @@ static int ocfs2_empty_dir_filldir(void *priv, const char *name, int name_len,
>   * routine to check that the specified directory is empty (for rmdir)
>   *
>   * Returns 1 if dir is empty, zero otherwise.
> + *
> + * XXX: This is a performance problem
>   */
>  int ocfs2_empty_dir(struct inode *inode)
>  {

	Why can't an indexed directory store the directory's child count
in the dx_root?  That way we can just read the dx_root and
ocfs2_empty_dir() gets much faster.  You already have the dx_root in
memory for all link/unlink operations anyway.
	While we're at it, the two places that call ocfs2_empty_dir()
should reorder their checks.  If they check (i_nlink != 2) first, they
can skip calling ocfs2_empty_dir() altogether when i_nlink has a
subdirectory.

> @@ -1401,29 +2435,57 @@ static void ocfs2_expand_last_dirent(char *start, unsigned int old_size,
>   */
>  static int ocfs2_expand_inline_dir(struct inode *dir, struct buffer_head *di_bh,
>  				   unsigned int blocks_wanted,
> +				   struct ocfs2_dir_lookup_result *lookup,
>  				   struct buffer_head **first_block_bh)
>  {
> -	u32 alloc, bit_off, len;
> +	u32 alloc, dx_alloc, bit_off, len;
>  	struct super_block *sb = dir->i_sb;
> -	int ret, credits = ocfs2_inline_to_extents_credits(sb);
> -	u64 blkno, bytes = blocks_wanted << sb->s_blocksize_bits;
> +	int ret, i, num_dx_leaves = 0,
> +		credits = ocfs2_inline_to_extents_credits(sb);
> +	u64 dx_insert_blkno, blkno,
> +		bytes = blocks_wanted << sb->s_blocksize_bits;
>  	struct ocfs2_super *osb = OCFS2_SB(dir->i_sb);
>  	struct ocfs2_inode_info *oi = OCFS2_I(dir);
>  	struct ocfs2_alloc_context *data_ac;
> +	struct ocfs2_alloc_context *meta_ac = NULL;
>  	struct buffer_head *dirdata_bh = NULL;
> +	struct buffer_head *dx_root_bh = NULL;
> +	struct buffer_head **dx_leaves = NULL;
>  	struct ocfs2_dinode *di = (struct ocfs2_dinode *)di_bh->b_data;
>  	handle_t *handle;
>  	struct ocfs2_extent_tree et;
> -	int did_quota = 0;
> +	struct ocfs2_extent_tree dx_et;
> +	int did_quota = 0, bytes_allocated = 0;

	Holy bejeezus we have a lot on the stack here?  Can we factor
this function into a couple of subfunctions somewhere?  That said, I
don't think it's needed for the initial merge.  So not right now.

> +int ocfs2_dx_dir_truncate(struct inode *dir, struct buffer_head *di_bh)
> +{
> +	int ret;
> +	unsigned int uninitialized_var(clen);
> +	u32 major_hash = UINT_MAX, p_cpos, uninitialized_var(cpos);

	I suppose I would have used U32_MAX instead of UINT_MAX, just to
match the u32 type, but in the end it is obviously the same value.

Joel

-- 

Life's Little Instruction Book #173

	"Be kinder than necessary."

Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127



More information about the Ocfs2-devel mailing list