[Ocfs2-tools-devel] [RFC] PATCH: verify slot number in __ocfs2_read_slot_map()

Sunil Mushran sunil.mushran at oracle.com
Sat Mar 7 09:21:52 PST 2009


On Sat, Mar 07, 2009 at 11:06:18PM +0800, Coly Li wrote:
> These days, I am testing ocfs2 with user space cluster stack (pmck). Right now
> there is a deadlocking when stating ocfs2 volume mount point. See
> https://bugzilla.novell.com/show_bug.cgi?id=482752
> 
> What I want to say is, when this dealocking happens, running mounted.ocfs2 will
> get a segmentation fault. I traced the coredump, it was because data read from
> __ocfs2_read_slot_map() might be (partial?) invalid, in ocfs2_print_nodes():
>  66                 node_num = map->md_slots[i].sd_node_num;
>  67                 if (names && names[node_num] && *(names[node_num]))
> node_num in 66 can be a very large number (due to the invalid data from
> __ocfs2_read_slot_map()), and names[node_num] references to an illegal memory
> region.
> 
> I checked code and tried to find out a method to verify whether slot map reading
> is valid when the deadlocking happens, but no idea so far.
> 
> A secondary solution is verify slot map number in __ocfs2_write_slot_map(), I
> attach the patch here.
> 
> I still have no idea how this deadlock happens, still trace the code. Forgive me
> that I can not provide more information on the deadlock.
> 
> Is the secondary solution acceptable ?
> Or is there solution to check whether I/O in __ocfs2_write_slot_map() is valid ?

mounted.ocfs2 does dirty reads. So we cannot trust the read.

> 
> Thanks.
> 
> Signed-off-by: Coly Li <coly.li at suse.de>
> ---
>  libocfs2/slot_map.c |   27 ++++++++++++++++++++++++++-
>  1 files changed, 26 insertions(+), 1 deletions(-)
> 
> diff --git a/libocfs2/slot_map.c b/libocfs2/slot_map.c
> index c33f458..870112a 100644
> --- a/libocfs2/slot_map.c
> +++ b/libocfs2/slot_map.c
> @@ -54,6 +54,29 @@ void ocfs2_swap_slot_map_extended(struct
> ocfs2_slot_map_extended *se,
>  			bswap_32(se->se_slots[i].es_node_num);
>  }
> 
> +/* es_node_num should be swapped to local cpu endian */
> +static errcode_t __ocfs2_verify_node_num(struct ocfs2_slot_map *sm,
> +					int num_slots)
> +{
> +	int i;
> +
> +	for (i = 0; i < num_slots; i++)
> +		if (sm->sm_slots[i] > num_slots)
> +			return OCFS2_ET_INTERNAL_FAILURE;

This does not look right. num_slots should be changed to
OCFS2_MAX_NUM_NODES or whatever that macro is called. The slot contains
the node number. The slotnumber is implicit.

> +	return 0;
> +}
> +
> +/* es_node_num should be swapped to local cpu endian */
> +static errcode_t __ocfs2_verify_node_num_extended(struct
> ocfs2_slot_map_extended *se,
> +						int num_slots)
> +{
> +	int i;
> +	for (i = 0; i < num_slots; i++)
> +		if (se->se_slots[i].es_node_num > num_slots)
> +			return OCFS2_ET_INTERNAL_FAILURE;

Same as above.

> +	return 0;
> +}
> +
>  static errcode_t __ocfs2_read_slot_map(ocfs2_filesys *fs,
>  				       int num_slots,
>  				       union ocfs2_slot_map_wrapper *wrap)
> @@ -90,13 +113,15 @@ static errcode_t __ocfs2_read_slot_map(ocfs2_filesys *fs,
>  		se = (struct ocfs2_slot_map_extended *)slot_map_buf;
>  		ocfs2_swap_slot_map_extended(se, num_slots);
>  		wrap->mw_map_extended = se;
> +		ret = __ocfs2_verify_node_num_extended(se, num_slots);
>  	} else {
>  		sm = (struct ocfs2_slot_map *)slot_map_buf;
>  		ocfs2_swap_slot_map(sm, num_slots);
>  		wrap->mw_map = sm;
> +		ret = __ocfs2_verify_node_num(sm, num_slots);
>  	}
> 
> -	return 0;
> +	return ret;
>  }
> 
>  errcode_t ocfs2_read_slot_map(ocfs2_filesys *fs,
> 
> 
> 
> -- 
> Coly Li
> SuSE Labs
> 
> _______________________________________________
> Ocfs2-tools-devel mailing list
> Ocfs2-tools-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-tools-devel



More information about the Ocfs2-tools-devel mailing list