[Ocfs2-devel] [PATCH 1/4] ocfs2: allocation reservations

Tao Ma tao.ma at oracle.com
Tue Mar 16 20:32:17 PDT 2010


Hi Mark,

Mark Fasheh wrote:
> On Tue, Mar 16, 2010 at 05:17:35PM +0800, Tao Ma wrote:
>> Hi Mark,
>> 	The idea is cool. Some comments inlined.
> 
> Thanks, and thanks for the detailed review.
> 
> 
>> <snip>
>>> +	} else {
>>> +		unsigned int shrink = lru_resv->r_len / 2;
>>> +
>>> +		mlog(0, "shrink lru resv by %u\n", shrink);
>>> +
>>> +		lru_resv->r_len -= shrink;
>> Do we need to change lru_resv->r_last_len here so that the next search 
>> can start right after the shrinked zone? I am not sure.
> 
> No, I left them unchanged on purpose - they still record the last sucessful
> allocation. In theory we could try adjusting them, but that'd be speculative
> and not based on any actual allocation requests.
oh, thanks for the explanation.
<snip>
>> Besides the comments, I have some other thoughts.
>> 1. Do we need to some additional check in 
>> ocfs2_reserve_local_alloc_bits? The old schema just check the condition 
>> bits_wanted <= free_bits since local_alloc is contiguous enough. But 
>> with reservation window, we can have a fragmented local alloc actually.
>> 2. Current codes don't handle with space reservations(unwritten 
>> extents). Maybe it is because unwritten will allocate contiguous clusters?
>>
>> With these 2 problems, I corrupted the system somehow by the following 
>> scripts(Sorry, Mark, ;) ).
>>
>> echo 'y'|mkfs.ocfs2 -b 4K -C 4K --fs-features=local $DEVICE
>> mount -t ocfs2 $DEVICE $MNT_DIR
>> for((j=0;j<32;j++))
>> do
>>          FILE_NAME=$MNT_DIR/$RANDOM
>>          for((i=0;i<63;i++))
>>          do
>>          cat /mnt/4096 >> $FILE_NAME #4096 is a file with 4096 bytes.
>>          done
>> done
>> FILE_NAME=$MNT_DIR/$RANDOM
>> ./rw_test -f $FILE_NAME -l 0 -u 8192 #this will create a file with an 
>> unwritten extent of 2 clusters length at offset 0.
>> umount $MNT_DIR
>>
>> Now run fsck.ocfs2, you will find duplicated clusters.
>> This is caused by the code here.
>>  >	start = ocfs2_local_alloc_find_clear_bits(osb, alloc, bits_wanted,
>>  >                                                  ac->ac_resv);
>>  >        if (start == -1) {
>>  >                /* TODO: Shouldn't we just BUG here? */
>>  >                status = -ENOSPC;
>>  >                mlog_errno(status);
>>  >                goto bail;
>>  >        }
>>  >        printk("end of start %d\n", start);
>>  >
>>  >        bitmap = la->la_bitmap;
>>  >        *bit_off = le32_to_cpu(la->la_bm_off) + start;
>>  >        /* local alloc is always contiguous by nature -- we never
>>  >         * delete bits from it! */
>>  >        *num_bits = bits_wanted;
>>
>> See here? bit_wanted is 2, while with the script, the local alloc is 
>> divided into 32 parts and every part only have 1 clusters left. But this 
>> code deem that we can always get bits_wanted. :)
> 
> Ahh, yeah we need to pass back the actual allocation from
> ocfs2_local_alloc_find_clear_bits(). That should be simple enough. I
> probably never hit this because all other allocations that went to the local
> alloc were 1 bit at a time (for file write, etc).
yeah except unwritten extents. :)
> 
> 
> Btw, I realized that we don't really need ocfs2_local_alloc_in_range() since
> we don't allocate inode groups from it any more. Or at least, I can simplify
> it quite a bit.
yeah, actually I thought of this when reviewing the patch. But maybe we 
need another ac_max_blocks in the future. Who knows? ;)
I guess currently we can just check bm_off+bg_bits and it is ok for us 
now. And also we can clean your ocfs2_local_alloc_find_clear_bits since 
now we have no chance of a local_resv which only is used for 
ocfs2_local_alloc_in_range now. ;)

Regards,
Tao



More information about the Ocfs2-devel mailing list