[Ocfs2-tools-devel] [PATCH 5/6] Resend: Add unwritten extent support in ocfs2-tools, take 2

Wed Sep 26 17:25:29 PDT 2007

Mark Fasheh wrote:
> On Wed, Sep 26, 2007 at 01:15:40PM +0800, tao.ma wrote:
>   
>> Resend it and Modify 2 things:
>> 1. include 1 bug fix of block allocation.
>> 2. Reorganize ocfs2_sync_path_to_disk to be more readable and efficient.
>>     
>
> Thanks for sending out a corrected patch. It's a lot to read, but the code
> looks familiar ;) The majority of this patch looks fine - I've got some
> comments below.
>
>
>   
>> +	if (del_right_subtree) {
>> +		ocfs2_unlink_subtree(fs, left_path, right_path, subtree_index);
>> +		ocfs2_update_edge_lengths(left_path);
>> +
>> +		/*
>> +		 * Now the good extent block information is stored in left_path, so
>> +		 * synchronize the right path with it.
>> +		 */
>> +		for (i = 0; i <= subtree_index; i++)
>> +			memcpy(right_path->p_node[i].buf,
>> +			       left_path->p_node[i].buf,
>> +			       fs->fs_blocksize);
>> +
>> +		eb = (struct ocfs2_extent_block *)path_leaf_buf(left_path);
>> +		di->i_last_eb_blk = eb->h_blkno;
>> +
>> +		/*
>> +		 * Removal of the extent in the left leaf was skipped
>> +		 * above so we could delete the right path
>> +		 * 1st.
>> +		 */
>> +		if (right_has_empty)
>> +			ocfs2_remove_empty_extent(left_leaf_el);
>> +
>> +		*deleted = 1;
>> +
>> +		/*
>> +		 * The extent block in the right path belwo subtree_index
>> +		 * have been deleted, so we don't need to synchronize
>> +		 * them to the disk.
>> +		 */
>> +		ret = ocfs2_sync_path_to_disk(fs, left_path, NULL,
>> +					      subtree_index);
>>     
>
> The ordering here is wrong, for userspace. The kernel module is actually
> delaying removal of the extent blocks until after the operation is done,
> whereas your code is removing them inline. We don't want that because a
> failure or crash at a bad time could leave used extents blocks in the
> deleted state.
>
> You want to make all the changes, sync the paths and _then_ remove the
> unused blocks from the allocator. You also need to make a similar change in
> ocfs2_remove_rightmost_path().
>
> By the way, it would be fair to say that since we're not using the journal,
> many tree changes aren't strictly atomic. I think though we can easily avoid
> freeing blocks before unlinking them from the tree, so it seems like a
> reasonable place to start.
>   
This problem can be resolved by duplicating the inode and its 
corresponding extent block before when start the operation like I do in 
ocfs2_insert_extent. I haven't added that because I'd like to make the 
whole process looks good. This will be a quite different patch.
>
>   
>> @@ -2102,9 +3258,11 @@ static void free_duplicated_extent_block_dinode(ocfs2_filesys *fs,
>>   * be considered invalid.
>>   *
>>   * Tree depth after the grow is returned via *final_depth.
>> + *
>> + * *last_eb_bh will be updated by ocfs2_add_branch().
>>     
>
> Here and lower in the function, you retain the kernel comments which
> actually need to be updated for libocfs2 variable names and conventions.
>
>
>   
>>   */
>>  static int ocfs2_grow_tree(ocfs2_filesys *fs, struct ocfs2_dinode *di,
>> -			   int *final_depth, char *last_eb)
>> +			   int *final_depth, char **last_eb)
>>  {
>>  	errcode_t ret;
>>  	char *eb_buf = NULL;
>> @@ -2130,12 +3288,20 @@ static int ocfs2_grow_tree(ocfs2_filesys *fs, struct ocfs2_dinode *di,
>>  			goto out;
>>  
>>  		depth++;
>> -		/*
>> -		 * Special case: we have room now if we shifted from
>> -		 * tree_depth 0, so no more work needs to be done.
>> -		 */
>> -		if (depth == 1)
>> +		if (depth == 1) {
>> +			/*
>> +			 * Special case: we have room now if we shifted from
>> +			 * tree_depth 0, so no more work needs to be done.
>> +			 *
>> +			 * We won't be calling add_branch, so pass
>> +			 * back *last_eb_bh as the new leaf. At depth
>> +			 * zero, it should always be null so there's
>> +			 * no reason to brelse.
>> +			 */
>> +			assert(*last_eb);
>>     
>
> This comment in particular, is rather confusing as it's referring to kernel
> conventions where *last_eb_bh == NULL, but in your case you're making sure
> that *last_eb is not NULL.
>   
Yes, I will modify all the comments to be more readable in ocfs2-tools.
>   
>
>
>   
>> @@ -2251,6 +3698,7 @@ errcode_t ocfs2_extend_allocation(ocfs2_filesys *fs, uint64_t ino,
>>  	uint64_t blkno, file_size;
>>  	char *buf = NULL;
>>  	struct ocfs2_dinode* di = NULL;
>> +	uint16_t flag = 0;
>>  
>>  	if (!(fs->fs_flags & OCFS2_FLAG_RW))
>>  		return OCFS2_ET_RO_FILESYS;
>> @@ -2265,6 +3713,9 @@ errcode_t ocfs2_extend_allocation(ocfs2_filesys *fs, uint64_t ino,
>>  
>>  	di = (struct ocfs2_dinode *)buf;
>>  
>> +	if (ocfs2_writes_unwritten_extents(OCFS2_RAW_SB(fs->fs_super)))
>> +		flag = OCFS2_EXT_UNWRITTEN;
>>     
>
> Uh-oh... Doesn't this function get called for expanding dirs and creating
> journals?
> It's illegal (and pointless anyway) for a directory or a journal to have an
> unwritten extent.
> I like your thinking though - make use of the feature if we can. Right now
> though, perhaps it's best to just let software which wants unwritten extents
> explicitely call ocfs2_allocate_unwritten_extents().
> 	--Mark
>   
en, maybe you are right.
So ocfs2_extend_allocation will just extend the blocks with no extent flags.
Thanks.