[Ocfs2-devel] [PATCH 1/3] ocfs2: Optimize inode allocation by remembering last group.

Mark Fasheh mfasheh at suse.com
Tue Jan 6 11:02:45 PST 2009


On Fri, Nov 28, 2008 at 06:58:43AM +0800, Tao Ma wrote:
> In ocfs2, the inode block search looks for the "emptiest" inode
> group to allocate from. So if an inode alloc file has many equally
> (or almost equally) empty groups, new inodes will tend to get
> spread out amongst them, which in turn can put them all over the
> disk. This is undesirable because directory operations on conceptually
> "nearby" inodes force a large number of seeks.
> 
> The good thing is that in ocfs2_alloc_context, there is a field named
> ac_last_group which will record the last group we allocate from. So
> we can only pass the right group to it and the following allocation
> will do as what we expect.
> 
> So we add ip_last_used_group in core directory inodes which records
> the last used allocation group. Another field named ip_last_used_slot
> is also added in case inode stealing happens. When claiming new inode,
> we passed in directory's inode so that the allocation can use this
> information.
> For more details, please see
> http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy.
> 
> Signed-off-by: Tao Ma <tao.ma at oracle.com>
> ---
>  fs/ocfs2/inode.c    |    2 ++
>  fs/ocfs2/inode.h    |    4 ++++
>  fs/ocfs2/namei.c    |    4 ++--
>  fs/ocfs2/suballoc.c |   21 +++++++++++++++++++++
>  fs/ocfs2/suballoc.h |    2 ++
>  5 files changed, 31 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ocfs2/inode.c b/fs/ocfs2/inode.c
> index 288512c..c3463c1 100644
> --- a/fs/ocfs2/inode.c
> +++ b/fs/ocfs2/inode.c
> @@ -350,6 +350,8 @@ void ocfs2_populate_inode(struct inode *inode, struct ocfs2_dinode *fe,
>  
>  	ocfs2_set_inode_flags(inode);
>  
> +	OCFS2_I(inode)->ip_last_used_slot = 0;
> +	OCFS2_I(inode)->ip_last_used_group = 0;
>  	mlog_exit_void();
>  }
>  
> diff --git a/fs/ocfs2/inode.h b/fs/ocfs2/inode.h
> index eb3c302..e1978ac 100644
> --- a/fs/ocfs2/inode.h
> +++ b/fs/ocfs2/inode.h
> @@ -72,6 +72,10 @@ struct ocfs2_inode_info
>  
>  	struct inode			vfs_inode;
>  	struct jbd2_inode		ip_jinode;
> +
> +	/* Only valid if the inode is the dir. */
> +	u32				ip_last_used_slot;
> +	u64				ip_last_used_group;
>  };
>  
>  /*
> diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c
> index 02c8026..a601dd5 100644
> --- a/fs/ocfs2/namei.c
> +++ b/fs/ocfs2/namei.c
> @@ -469,8 +469,8 @@ static int ocfs2_mknod_locked(struct ocfs2_super *osb,
>  
>  	*new_fe_bh = NULL;
>  
> -	status = ocfs2_claim_new_inode(osb, handle, inode_ac, &suballoc_bit,
> -				       &fe_blkno);
> +	status = ocfs2_claim_new_inode(osb, handle, dir, parent_fe_bh,
> +				       inode_ac, &suballoc_bit, &fe_blkno);
>  	if (status < 0) {
>  		mlog_errno(status);
>  		goto leave;
> diff --git a/fs/ocfs2/suballoc.c b/fs/ocfs2/suballoc.c
> index 226fe21..f75782f 100644
> --- a/fs/ocfs2/suballoc.c
> +++ b/fs/ocfs2/suballoc.c
> @@ -1587,6 +1587,8 @@ bail:
>  
>  int ocfs2_claim_new_inode(struct ocfs2_super *osb,
>  			  handle_t *handle,
> +			  struct inode *dir,
> +			  struct buffer_head *parent_fe_bh,
>  			  struct ocfs2_alloc_context *ac,
>  			  u16 *suballoc_bit,
>  			  u64 *fe_blkno)
> @@ -1594,6 +1596,8 @@ int ocfs2_claim_new_inode(struct ocfs2_super *osb,
>  	int status;
>  	unsigned int num_bits;
>  	u64 bg_blkno;
> +	struct ocfs2_dinode *parent_fe =
> +			(struct ocfs2_dinode *)parent_fe_bh->b_data;
>  
>  	mlog_entry_void();
>  
> @@ -1602,6 +1606,21 @@ int ocfs2_claim_new_inode(struct ocfs2_super *osb,
>  	BUG_ON(ac->ac_bits_wanted != 1);
>  	BUG_ON(ac->ac_which != OCFS2_AC_USE_INODE);
>  
> +	/*
> +	 * Try to allocate inodes from some specific group.
> +	 *
> +	 * If the parent dir has recorded the last group used in allocation,
> +	 * cool, use it. Otherwise if we try to allocate new inode from the
> +	 * same slot the parent dir belongs to, use the same chunk.
> +	 */
> +	if (OCFS2_I(dir)->ip_last_used_group &&
> +	    OCFS2_I(dir)->ip_last_used_slot == ac->ac_alloc_slot)
> +		ac->ac_last_group = OCFS2_I(dir)->ip_last_used_group;
> +	else if (le16_to_cpu(parent_fe->i_suballoc_slot) ==
> +		 ac->ac_alloc_slot)
> +		ac->ac_last_group = le64_to_cpu(parent_fe->i_blkno) -
> +				    le16_to_cpu(parent_fe->i_suballoc_bit);

You should use ocfs2_which_suballoc_group() here, instead of open coding the
math to get ac_last_group.

Also, would it be possible for us to put this block in it's own function so
that it's easier to play with the logic in the future?


One last thing - can you add to the comment:

*
* We are very careful here to avoid the mistake of setting ac_last_group to
* a group descriptor from a different (unlocked) slot.
*/
	--Mark

--
Mark Fasheh



More information about the Ocfs2-devel mailing list