[Ocfs2-tools-devel] [PATCH 09/11] ocfs2-tools: add xattr support in fsck.ocfs2

Tue Mar 3 17:14:16 PST 2009


Joel Becker wrote:
> On Fri, Feb 27, 2009 at 03:54:49PM +0800, Tiger Yang wrote:
>> +static uint16_t detect_xattr_count_in_bucket(struct ocfs2_xattr_header *xh,
>> +					     struct xattr_info *xi)
>> +{
>> +	uint16_t count = 0;
>> +	struct ocfs2_xattr_entry *entry = xh->xh_entries;
>> +	struct ocfs2_xattr_entry *pre_xe = entry;
>> +	uint16_t max_count = (xi->max_offset - HEADER_SIZE) / ENTRY_SIZE;
>> +
>> +	if (xh->xh_count == 0 && xh->xh_free_start == OCFS2_XATTR_BUCKET_SIZE)
>> +		return 0;
>> +
>> +	while (!IS_LAST_ENTRY(entry)) {
>> +		/*
>> +		 * If we could not get the reasonable count,
>> +		 * we will remove them by setting count to zero
>> +		 */
>> +		if (count >= max_count)
>> +			return 0;
>> +		/* xattr entries in bucket had sorted by name_hash */
>> +		if (entry->xe_name_hash < pre_xe->xe_name_hash)
>> +			return count;
>> +		count++;
>> +		pre_xe = entry;
>> +		entry++;
>> +	}
>> +
>> +	return count;
>> +}
> 
> 	Boy, I really want this to use the pre_xe->xe_name_offset to
> verify that the entry hasn't walked past the free_start.  There is no
> guarantee that the start of a name value won't be greater than the
> name_hash of the last valid entry.  This would give a false extra entry.
> 	This is why I wish we had a consistent free_start in all types
> of xattr structures, but it's too late for that.  I realize you didn't
> use the xe_name_offset fields here because you haven't validated them
> yet.  So on to that part.
Just go through the patch. Here tiger don't use free_start and 
name_value_len because he check them after this function. So even with a 
consistent free_start and name_value_len, He can't use it here.

>> +static errcode_t check_xattr_entry(o2fsck_state *ost,
>> +				   struct ocfs2_dinode *di,
>> +				   struct ocfs2_xattr_header *xh,
>> +				   int *changed,
>> +				   struct xattr_info *xi)
>> +{
>> +	int i;
>> +
>> +	for (i = 0 ; i < xh->xh_count; i++) {
>> +		struct ocfs2_xattr_entry *xe = &xh->xh_entries[i];
>> +		uint32_t hash;
>> +
>> +		/* check and fix name_offset */
>> +		if (xe->xe_name_offset >= xi->max_offset) {
>> +			if (prompt(ost, PY, PR_XATTR_OFFSET_INVALID,
>> +				   "Extended attribute entry in %s #%"PRIu64
>> +				   " refers to an invalid name offset %u,"
>> +				   " clear this entry?",
>> +				   xattr_object[xi->location], xi->blkno,
>> +				   xe->xe_name_offset)) {
>> +				/*
>> +				 * remove the entry and left the name/value,
>> +				 * because we don't know the correct offset
>> +				 * of them.
>> +				 */
>> +				memmove((void *)xe, (void *)(xe + 1),
>> +					(xh->xh_count - i - 1) * ENTRY_SIZE);
>> +				memset(&xh->xh_entries[xh->xh_count - 1], 0,
>> +					ENTRY_SIZE);
>> +				xh->xh_count -= 1;
>> +				i -= 1;
>> +				*changed = 1;
>> +				continue;
>> +			} else
>> +				return -1;
>> +		}
> 
> 	When you find invalid entries, you leave them in place.  This
> means that name_value_len and free_start get out of sync, because
> name_value_len only covers the good values, but free_start covers all
> values.  I think the kernel handles this ok because it never compares
> free_start and name_value_len, but what happens if someone doesn't
> realize this and changes it?
> 	I suppose we all just watch out for that in future patches.
I just saw that tiger adjust these 2 values in check_xattr after 
check_xattr_value.

>> +static errcode_t ocfs2_check_xattr_buckets(o2fsck_state *ost,
>> +					   struct ocfs2_dinode *di,
>> +					   uint64_t blkno,
>> +					   uint32_t clusters)
>> +{
>> +	int i;
>> +	errcode_t ret = 0;
>> +	char *bucket = NULL;
>> +	char *bucket_buf = NULL;
>> +	struct ocfs2_xattr_header *xh;
>> +	int blk_per_bucket = ocfs2_blocks_per_xattr_bucket(ost->ost_fs);
>> +	uint32_t bpc = ocfs2_xattr_buckets_per_cluster(ost->ost_fs);
>> +	uint32_t max_buckets = clusters * bpc;
>> +	uint32_t max_blocks = max_buckets * blk_per_bucket;
>> +	uint32_t num_buckets = 0;
>> +	uint64_t blk = 0;
>> +
>> +	/* malloc space for all buckets */
>> +	ret = ocfs2_malloc_blocks(ost->ost_fs->fs_io, max_blocks, &bucket);
>> +	if (ret) {
>> +		com_err(whoami, ret, "while allocating room to read"
>> +			" extended attributes bucket");
>> +		goto out;
>> +	}
>> +
>> +	/* read all buckets for detect (some of them may not be used) */
>> +	bucket_buf = bucket;
>> +	blk = blkno;
>> +	for (i = 0; i < max_buckets; i++) {
>> +		ret = ocfs2_read_xattr_bucket(ost->ost_fs, blk, bucket_buf);
>> +		if (ret) {
>> +			com_err(whoami, ret, "while reading bucket of"
>> +				" extended attributes ");
>> +			goto out;
>> +		}
>> +		blk += blk_per_bucket;
>> +		bucket_buf += OCFS2_XATTR_BUCKET_SIZE;
>> +	}
> 
> 	This code reads all the possible buckets in a cluster, but the
> actual number of used buckets might be less.  If that's the case,
> ocfs2_read_xattr_bucket() is going to return an error for the invalid
> bucket (at least in the metaecc case).  So you can't just exit out.
> Instead, why not just change max_buckets = i and continue on?  You'll
> just be validating the buckets you could read, which should, in fact,
> match the buckets you expected.
yes, you are right.

Regards,
Tao