[Ocfs2-tools-devel] [PATCH 5/6] Resend: Add unwritten extent support in ocfs2-tools, take 2

Wed Sep 26 16:12:44 PDT 2007

On Wed, Sep 26, 2007 at 01:15:40PM +0800, tao.ma wrote:
> Resend it and Modify 2 things:
> 1. include 1 bug fix of block allocation.
> 2. Reorganize ocfs2_sync_path_to_disk to be more readable and efficient.

Thanks for sending out a corrected patch. It's a lot to read, but the code
looks familiar ;) The majority of this patch looks fine - I've got some
comments below.

> +	if (del_right_subtree) {
> +		ocfs2_unlink_subtree(fs, left_path, right_path, subtree_index);
> +		ocfs2_update_edge_lengths(left_path);
> +
> +		/*
> +		 * Now the good extent block information is stored in left_path, so
> +		 * synchronize the right path with it.
> +		 */
> +		for (i = 0; i <= subtree_index; i++)
> +			memcpy(right_path->p_node[i].buf,
> +			       left_path->p_node[i].buf,
> +			       fs->fs_blocksize);
> +
> +		eb = (struct ocfs2_extent_block *)path_leaf_buf(left_path);
> +		di->i_last_eb_blk = eb->h_blkno;
> +
> +		/*
> +		 * Removal of the extent in the left leaf was skipped
> +		 * above so we could delete the right path
> +		 * 1st.
> +		 */
> +		if (right_has_empty)
> +			ocfs2_remove_empty_extent(left_leaf_el);
> +
> +		*deleted = 1;
> +
> +		/*
> +		 * The extent block in the right path belwo subtree_index
> +		 * have been deleted, so we don't need to synchronize
> +		 * them to the disk.
> +		 */
> +		ret = ocfs2_sync_path_to_disk(fs, left_path, NULL,
> +					      subtree_index);

The ordering here is wrong, for userspace. The kernel module is actually
delaying removal of the extent blocks until after the operation is done,
whereas your code is removing them inline. We don't want that because a
failure or crash at a bad time could leave used extents blocks in the
deleted state.

You want to make all the changes, sync the paths and _then_ remove the
unused blocks from the allocator. You also need to make a similar change in
ocfs2_remove_rightmost_path().

By the way, it would be fair to say that since we're not using the journal,
many tree changes aren't strictly atomic. I think though we can easily avoid
freeing blocks before unlinking them from the tree, so it seems like a
reasonable place to start.

> @@ -2102,9 +3258,11 @@ static void free_duplicated_extent_block_dinode(ocfs2_filesys *fs,
>   * be considered invalid.
>   *
>   * Tree depth after the grow is returned via *final_depth.
> + *
> + * *last_eb_bh will be updated by ocfs2_add_branch().

Here and lower in the function, you retain the kernel comments which
actually need to be updated for libocfs2 variable names and conventions.

>   */
>  static int ocfs2_grow_tree(ocfs2_filesys *fs, struct ocfs2_dinode *di,
> -			   int *final_depth, char *last_eb)
> +			   int *final_depth, char **last_eb)
>  {
>  	errcode_t ret;
>  	char *eb_buf = NULL;
> @@ -2130,12 +3288,20 @@ static int ocfs2_grow_tree(ocfs2_filesys *fs, struct ocfs2_dinode *di,
>  			goto out;
>  
>  		depth++;
> -		/*
> -		 * Special case: we have room now if we shifted from
> -		 * tree_depth 0, so no more work needs to be done.
> -		 */
> -		if (depth == 1)
> +		if (depth == 1) {
> +			/*
> +			 * Special case: we have room now if we shifted from
> +			 * tree_depth 0, so no more work needs to be done.
> +			 *
> +			 * We won't be calling add_branch, so pass
> +			 * back *last_eb_bh as the new leaf. At depth
> +			 * zero, it should always be null so there's
> +			 * no reason to brelse.
> +			 */
> +			assert(*last_eb);

This comment in particular, is rather confusing as it's referring to kernel
conventions where *last_eb_bh == NULL, but in your case you're making sure
that *last_eb is not NULL.

> +/*
> + * Mark part or all of the extent record at split_index in the leaf
> + * pointed to by path as written. This removes the unwritten
> + * extent flag.
> + *
> + * Care is taken to handle contiguousness so as to not grow the tree.
> + *
> + * meta_ac is not strictly necessary - we only truly need it if growth
> + * of the tree is required. All other cases will degrade into a less
> + * optimal tree layout.

Should remove the part about meta_ac...

> +/*
> + * Mark the already-existing extent at cpos as written for len clusters.
> + *
> + * If the existing extent is larger than the request, initiate a
> + * split. An attempt will be made at merging with adjacent extents.
> + *
> + */
> +int ocfs2_mark_extent_written(ocfs2_filesys *fs, struct ocfs2_dinode *di,
> +			      uint32_t cpos, uint32_t len,
> +			      uint64_t p_blkno)
> +{
> +	int ret, index;
> +	struct ocfs2_path *left_path = NULL;
> +	struct ocfs2_extent_list *el;
> +	struct insert_ctxt ctxt;
> +
> +	if (!ocfs2_writes_unwritten_extents(OCFS2_RAW_SB(fs->fs_super)))
> +		return OCFS2_ET_UNSUPP_FEATURE;
> +
> +	/*
> +	 * XXX: This should be fixed up so that we just re-insert the
> +	 * next extent records.
> +	 */

Another comment that doesn't make sense here - just remove this one
entirely. In kernel it refers to the extent map, which no longer exists in
userspace.

> @@ -2251,6 +3698,7 @@ errcode_t ocfs2_extend_allocation(ocfs2_filesys *fs, uint64_t ino,
>  	uint64_t blkno, file_size;
>  	char *buf = NULL;
>  	struct ocfs2_dinode* di = NULL;
> +	uint16_t flag = 0;
>  
>  	if (!(fs->fs_flags & OCFS2_FLAG_RW))
>  		return OCFS2_ET_RO_FILESYS;
> @@ -2265,6 +3713,9 @@ errcode_t ocfs2_extend_allocation(ocfs2_filesys *fs, uint64_t ino,
>  
>  	di = (struct ocfs2_dinode *)buf;
>  
> +	if (ocfs2_writes_unwritten_extents(OCFS2_RAW_SB(fs->fs_super)))
> +		flag = OCFS2_EXT_UNWRITTEN;

Uh-oh... Doesn't this function get called for expanding dirs and creating
journals?

It's illegal (and pointless anyway) for a directory or a journal to have an
unwritten extent.

I like your thinking though - make use of the feature if we can. Right now
though, perhaps it's best to just let software which wants unwritten extents
explicitely call ocfs2_allocate_unwritten_extents().
	--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh at oracle.com