[Ocfs2-devel] [PATCH 0/3] Add inode stealing for ocfs2.V1

Sunil Mushran Sunil.Mushran at oracle.com
Fri Feb 22 10:29:43 PST 2008


True... however, in 1.2 (or anything before 2.6.23) only extent_alloc:0000
is used by all nodes. This was done to avoid deadlocks during truncate.

2.6.23 or 24 onwards Mark added code to allow use of all extent_alloc after
adding code to prevent deadlocks during truncate.

In general, allocations from extent_alloc are not that common as we have
fairly flat trees. If this does become an issue, we will handle it 
similarly.

wengang wang wrote:
> not know it clearly, but I remember when extending a file, meta is 
> allocated in extent_alloc instead of inode_alloc if necessary(correct 
> me if i am wrong).
> if so, do we need to take extent_alloc into consideration as well?
>
> thanks,
> wengang.
>
> Tao Ma wrote:
>> Hi all,
>>     This patch set improve the method for inode allocation. Now they
>> are divided into 3 small patches, but I think maybe they can be merged
>> together as one. Any comments are welcomed.
>>
>> In OCFS2, we allocate the inodes from slot specific inode_alloc to avoid
>> inode creation congestion. The local alloc file grows in a large 
>> contiguous
>> chunk. As for a 4K bs, it grows 4M every time. So 1024 inodes will be
>> allocated at a time.
>>
>> Over time, if the fs gets fragmented enough(e.g, the user has created 
>> many
>> small files and also delete some of them), we can end up in a situation,
>> whereby we cannot extend the inode_alloc as we don't have a large chunk
>> free in the global_bitmap even if df shows few gigs free. More 
>> annoying is
>> that this situation will invariably mean that while one cannot create 
>> inodes
>> on one node but can from another node. Still more annoying is that an 
>> unused
>> slot may have space for plenty of inodes but is unusable as the user 
>> may not
>> be mounting as many nodes anymore.
>>
>> This patch series implement a solution which is to steal inodes from 
>> another
>> slot. Now the whole inode allocation process looks like this:
>> 1. Allocate from its own inode_alloc:000X
>>    1) If we can reserve, OK.
>>    2) If fails, try to allocate a large chunk and reserve once again.
>> 2. If 1 fails, try to allocate from the last node's inode_alloc. This 
>> time,
>>    Just try to reserve, we don't go for global_bitmap if this inode also
>>    can't allocate the inode.
>> 3. If 2 fails, try the node before it until we reach inode_alloc:0000.
>>    In the process, we will skip its own inode_alloc.
>> 4. If 3 fails, try to allocate from its own inode_alloc:000X once 
>> again. Here
>>    is a chance that the global_bitmap may has a large enough chunk 
>> now during
>>    the inode iteration process.
>>
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>   
>




More information about the Ocfs2-devel mailing list