[Ocfs2-devel] [PATCH 0/3] ocfs2: Inode Allocation Strategy Improvement

Tao Ma tao.ma at oracle.com
Tue Jan 6 16:27:03 PST 2009


Hi Mark,
	Thanks for the review.

Mark Fasheh wrote:
> On Fri, Nov 28, 2008 at 02:49:19PM +0800, Tao Ma wrote:
>> Hi all,
>> 	In ocfs2, when we create a fresh file system and create inodes in 
>> 	it, they are contiguous and good for readdir+stat. While if we delete all 
>> the inodes and created again, the new inodes will get spread out and 
>> that isn't what we need. The core problem here is that the inode block 
>> search looks for the "emptiest" inode group to allocate from. So if an 
>> inode alloc file has many equally (or almost equally) empty groups, new 
>> inodes will tend to get spread out amongst them, which in turn can put 
>> them all over the disk. This is undesirable because directory operations 
>> on conceptually "nearby" inodes force a large number of seeks. For more 
>> details, please see 
>> http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy.(I 
>> have modified it a little, Mark, if you are interested, please look at 
>> it. They are underlined.)
> 
> Your edits look fine. Thanks for updating the design doc.
cool.
> 
> 
>> So this patch set try to fix this problem.
>> patch 1: Optimize inode allocation by remembering last group.
>> We add ip_last_used_group in core directory inodes which records
>> the last used allocation group. Another field named ip_last_used_slot
>> is also added in case inode stealing happens. When claiming new inode,
>> we passed in directory's inode so that the allocation can use this
>> information.
>>
>> patch 2: let the Inode group allocs use the global bitmap directly.
>>
>> patch 3: we add osb_last_alloc_group in ocfs2_super to record the last
>> used allocation group so that we can make inode groups contiguous enough.
> 
> So, the logic in your patches is correct. As you can see, most of my
> comments were more about code flow or trivial cleanups. Assuming this all
> works as we expect, there shouldn't be much code for you to modify before
> the patches can be put in the merge_window branch.
> 
> 
> One things though - would you mind providing a small amount of data to show
> what sort of improvement (if any) we're getting from these patches? I don't
> think we need anything fancy - just enough to answer the following two
> questions:
> 
> - How much does this improve our inode fragmentation level?
Actually I have some statistics and the result is cool. ;)  I will 
attach it in the next round of patches.
> 
> Any test that fragments the inode space would be appropriate for this.
> 
> We could then simply express fragmentation as some value - maybe a ratio of
> adjacent inodes as compared to total # of inodes, expressed as a percentage
> value. It would be nice for future testing if we had a small tool to
> calculate this (maybe by libocfs2, or by just making readdir calls and
> looking at inode number),
> 
> 
> - Does the 2nd patch impact overall inode creation times in a cluster, since
>   we're now using the cluster bitmap instead of local alloc.
No statistics here since I originally think there should be not much 
difference. But I will test it and attach the result. Thanks.

Regards,
Tao



More information about the Ocfs2-devel mailing list