[Ocfs2-devel] [PATCH 0/3] Add inode stealing for ocfs2.V1
Sunil Mushran
Sunil.Mushran at oracle.com
Fri Feb 22 10:29:43 PST 2008
True... however, in 1.2 (or anything before 2.6.23) only extent_alloc:0000
is used by all nodes. This was done to avoid deadlocks during truncate.
2.6.23 or 24 onwards Mark added code to allow use of all extent_alloc after
adding code to prevent deadlocks during truncate.
In general, allocations from extent_alloc are not that common as we have
fairly flat trees. If this does become an issue, we will handle it
similarly.
wengang wang wrote:
> not know it clearly, but I remember when extending a file, meta is
> allocated in extent_alloc instead of inode_alloc if necessary(correct
> me if i am wrong).
> if so, do we need to take extent_alloc into consideration as well?
>
> thanks,
> wengang.
>
> Tao Ma wrote:
>> Hi all,
>> This patch set improve the method for inode allocation. Now they
>> are divided into 3 small patches, but I think maybe they can be merged
>> together as one. Any comments are welcomed.
>>
>> In OCFS2, we allocate the inodes from slot specific inode_alloc to avoid
>> inode creation congestion. The local alloc file grows in a large
>> contiguous
>> chunk. As for a 4K bs, it grows 4M every time. So 1024 inodes will be
>> allocated at a time.
>>
>> Over time, if the fs gets fragmented enough(e.g, the user has created
>> many
>> small files and also delete some of them), we can end up in a situation,
>> whereby we cannot extend the inode_alloc as we don't have a large chunk
>> free in the global_bitmap even if df shows few gigs free. More
>> annoying is
>> that this situation will invariably mean that while one cannot create
>> inodes
>> on one node but can from another node. Still more annoying is that an
>> unused
>> slot may have space for plenty of inodes but is unusable as the user
>> may not
>> be mounting as many nodes anymore.
>>
>> This patch series implement a solution which is to steal inodes from
>> another
>> slot. Now the whole inode allocation process looks like this:
>> 1. Allocate from its own inode_alloc:000X
>> 1) If we can reserve, OK.
>> 2) If fails, try to allocate a large chunk and reserve once again.
>> 2. If 1 fails, try to allocate from the last node's inode_alloc. This
>> time,
>> Just try to reserve, we don't go for global_bitmap if this inode also
>> can't allocate the inode.
>> 3. If 2 fails, try the node before it until we reach inode_alloc:0000.
>> In the process, we will skip its own inode_alloc.
>> 4. If 3 fails, try to allocate from its own inode_alloc:000X once
>> again. Here
>> is a chance that the global_bitmap may has a large enough chunk
>> now during
>> the inode iteration process.
>>
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-devel
>>
>
More information about the Ocfs2-devel
mailing list