[Ocfs2-tools-devel] [PATCH 2/3] fsck.ocfs2: Add extent list check for discontig bg.

Joel Becker Joel.Becker at oracle.com
Wed Jul 21 19:28:14 PDT 2010


On Wed, Jul 21, 2010 at 03:31:37PM +0800, Tao Ma wrote:
> diff --git a/fsck.ocfs2/fsck.ocfs2.checks.8.in b/fsck.ocfs2/fsck.ocfs2.checks.8.in
> index 5bea432..3855230 100644
> --- a/fsck.ocfs2/fsck.ocfs2.checks.8.in
> +++ b/fsck.ocfs2/fsck.ocfs2.checks.8.in
> @@ -295,6 +295,65 @@ what was found by totalling up the group descriptors.
>  Answering yes updates the c_total and c_free members of the header to reflect
>  what was found in the group descriptors in the chain.
>  
> +.SS "DISCONTIG_BG_DEPTH"
> +A discontiguous block group has a extent list which records all the clusters
> +allocated to it and we can't only support a tree depth with 0 while we find
> +one discontiguous block group which has tree depth greater than 0.

	... allocated to it.  Discontiguous block groups only support
extent lists with a tree depth of 0.  A block group claims to have a
tree depth greater than zero.

> +Answering yes updates the l_tree_depth of the extent list to 0.

Answering yes will set the tree depth of the extent list to 0.

> +.SS "DISCONTIG_BG_COUNT"
> +A discontiguous block group has a extent list which records all the clusters
> +allocated to it. The total record number is stored in bg_list.l_count.
> +It is found that the count has exceeded the limit of the group.

... allocated to it.  A block group claims to have more records than
can actually fit.

> +Answering yes updates the l_count to the maximum extent record number for the
> +extent list.

Answering yes will set the record count to the maximum possible.

> +.SS "DISCONTIG_BG_REC_RANGE"
> +A extent record in a discontig block group was found which claims to reference
> +a region of clusters which partially extends beyond the number of clusters
> +in the volume.

Block groups set aside clusters to be used for metadata.  A
discontiguous block group claims to contain clusters beyond the end of
the volume.

> +Answering yes will remove the whole discontig block group.

Answering yes will remove the block group.

> +.SS "DISCONTIG_BG_CORRUPT_LEAFS"

	DISCONTIG_BG_CORRUPT_LEAVES

> +More than one extent records in a discontig block group were found which have
> +broken clusters allocated.

A discontiguous block group has a extent list which records all the clusters
allocated to it.  A group has more than one extent claiming to have an
impossible number of clusters.

> +Answering yes will remove the whole discontig block group.

Answering yes will remove the block group.

> +.SS "DISCONTIG_BG_CLUSTERS"
> +Extent records in a discontig block group were found having more clusters
> +allocated then a block group can handle.
> +
> +Answering yes will remove the whole discontig block group.

	See the code for how I think this check can be changed.

> +.SS "DISCONTIG_BG_NEXT_FREE_REC"
> +The l_next_free_rec field of a discontig block group was found broken and
> +didn't indicate all the extent records in the group.

A discontiguous block group has a extent list which records all the clusters
allocated to it.  A group was found with fewer filled in extents than it
claims to have.  The filled in extents describe a complete and correct
group.

> +Answering yes will update it with the real extent record number we have.

Answering yes will set the used extent count to the number of filled
extents.

> +.SS "DISCONTIG_BG_LIST_CORRUPT"
> +We use extent list to store allocated clusters in a discontig block group.
> +Both the used record count and the and the total clusters in the extent list were found
> +broken and we can't fix them on site.

A discontiguous block group has a extent list which records all the clusters
allocated to it.  The group claims to have more extents than is
possible, and the existing extents contain errors.

> +Answering yes will remove the whole discontig block group.

Answering yes will remove the block group.

> +
> +.SS "DISCONTIG_BG_REMOVE_REC"
> +One extent record in a discontig block group was found broken while
> +others look sane.

A discontiguous block group has a extent list which records all the clusters
allocated to it.  A group was found with one broken extent, but the
remaining extents describe a complete and correct group.

> +Answering yes will remove the corrupted extent record.
> +
> +.SS "DISCONTIG_BG_LEAF_CLUSTERS"
> +One extent record in a discontig block group was found having more clusters
> +allocated then a block group can handle.

A discontiguous block group has a extent list which records all the clusters
allocated to it.  A group was found with one extent claiming too many
clusters, but the remaining extents are correct.

> +Answering yes will update the value calculated by other extent records.

Answering yes will set the number of the clusters on the broken extent
to the difference between the total clusters a group must have and the
sum of the remaining extents.

(I'm not totally happy with the wording here.)

>  \" pass1.c
>  
>  .SS "INODE_ALLOC_REPAIR"
> diff --git a/fsck.ocfs2/pass0.c b/fsck.ocfs2/pass0.c
> index aac0561..81fbf72 100644
> --- a/fsck.ocfs2/pass0.c
> +++ b/fsck.ocfs2/pass0.c
> @@ -155,6 +155,173 @@ out:
>  	return ret;
>  }
>  
> +static void check_discontig_bg(o2fsck_state *ost, int cpg,
> +			       struct ocfs2_group_desc *bg,
> +			       int *changed, int *clear_ref)
> +{
> +	uint64_t blkno = bg->bg_blkno;
> +	int next_free, i, total_clusters = 0, cpy;
> +	int fix_leaf_clusters = 0, fix_pos = -1;
> +	struct ocfs2_extent_rec *rec;
> +
> +	if (bg->bg_list.l_tree_depth &&
> +	    prompt(ost, PY, PR_DISCONTIG_BG_DEPTH,
> +		   "Discontig Group descriptor at block %"PRIu64" has "

		    ^^^ "Discontiguous".  There's no reason to shorten
"discontiguous" to "discontig" in actual user-visible help text.

> +		   "a tree depth %u which is greathen than 0. "

						^^^ "greater"

> +		   "Change it to 0?", blkno, bg->bg_list.l_tree_depth)) {
> +		bg->bg_list.l_tree_depth = 0;
> +		*changed = 1;
> +	}
> +
> +	if ((bg->bg_list.l_count >
> +	     ocfs2_extent_recs_per_gd(ost->ost_fs->fs_blocksize)) &&
> +	    prompt(ost, PY, PR_DISCONTIG_BG_COUNT,
> +		   "Discontig Group descriptor at block %"PRIu64" has "
> +		   "a extent count %u which is greathen than %u. "
> +		   "Change it?", blkno, bg->bg_list.l_count,

		   "Discontigous group descriptor at block %"PRIu64" has "
		   "an extent count of %u, but discontiguous groups can "
		   "only hold %u records.  Set it to %u?", blkno,
		   bg->bg_list.l_count, 
		   ocfs2_extent_recs_per_gd(ost->ost_fs->fs_blocksize),
		   ocfs2_extent_recs_per_gd(ost->ost_fs->fs_blocksize))) {

> +	if (bg->bg_list.l_next_free_rec > bg->bg_list.l_count)
> +		next_free = bg->bg_list.l_count;
> +	else
> +		next_free = bg->bg_list.l_next_free_rec;
> +
> +	for (i = 0; i < next_free; i++) {
> +		rec = &bg->bg_list.l_recs[i];
> +
> +		/*
> +		 * We treat e_blkno = 0 and e_leaf_cluster = 0 as the
> +		 * end of the extent list so that we can find the proper
> +		 * l_next_free_rec.
> +		 */
> +		if (!rec->e_blkno && !rec->e_leaf_clusters)
> +			break;
> +
> +		if (ocfs2_block_out_of_range(ost->ost_fs, rec->e_blkno)) {

	Shouldn't we be checking the last block of the extent, not the
first one?

> +			if (prompt(ost, PY, PR_DISCONTIG_BG_REC_RANGE,
> +				   "Discontig block group %"PRIu64" in chain "
> +				   "%d at inode %"PRIu64" contains a reference "
> +				   "block %"PRIu64" which is out "
> +				   "of range. Drop this group?",

				   "Discontiguous block group %"PRIu64" in "
				   "chain %d of inode %"PRIu64" claims "
				   "clusters beyond the end of the volume. "
				   "Drop this group?"

> +				   blkno, bg->bg_chain,
> +				   (uint64_t)bg->bg_parent_dinode,
> +				   (uint64_t)rec->e_blkno))  {
> +				*clear_ref = 1;
> +			}
> +			goto out;
> +		}
> +
> +		if (rec->e_leaf_clusters > cpg) {
> +			if (fix_leaf_clusters) {
> +				if (prompt(ost, PY,
> +					   PR_DISCONTIG_BG_CORRUPT_LEAFS,

					   PR_DISCONTIG_BG_CORRUPT_LEAVES,

> +					   "Discontig block group %"PRIu64" "
> +					   "in chain %d at inode %"PRIu64" "
> +					   "have too many errors in "
> +					   "e_leaf_clusters. "
> +					   "Drop this group?",
> +					   blkno, bg->bg_chain,
> +					   (uint64_t)bg->bg_parent_dinode))

					   "Discontiguous block group %"PRIu64
					   " in chain %d of inode %"PRIu64" "
					   "has errors in more than one extent "
					   "record.  Record %d claims %u "
					   "clusters and record %d claims %u "
					   "clusters, but a group does not "
					   "contain more than %u clusters. "
					   "Drop this group?", blkno,
					   bg->bg_chain,
					   (uint64_t)bg->bg_parent_dinode,
					   fix_pos, 
					   bg->bg_list.l_recs[fix_pos].e_leaf_clusters;
					   i, rec->e_leaf_clusters, cpg))

> +					*clear_ref = 1;
> +				goto out;
> +			}
> +			fix_pos = i;
> +			fix_leaf_clusters = 1;
> +		} else
> +			total_clusters += rec->e_leaf_clusters;

		} else if ((total_clusters + rec->e_leaf_clusters) > cpg) {
			if (total_clusters == cpg) {
				Ask to truncate l_next_free_rec at i.
			} else {
				Ask to truncate rec->e_leaf_clusters at
				cpg -. total_clusters.  Or perhaps drop
				the group if we don't think that is safe
				to do.
			}
		} else
			total_clusters += rec->e_leaf_clusters;

> +	}
> +
> +	if (total_clusters > cpg) {
> +		if (prompt(ost, PY, PR_DISCONTIG_BG_CLUSTERS,
> +		    "Discontig Group descriptor at block %"PRIu64" has "
> +		    "a extent list which has %u clusters allocated, which "
> +		    "is greater than %u. Drop the group?",
> +		    blkno, total_clusters, cpg))
> +			*clear_ref = 1;
> +		goto out;
> +	}

	If you check total_clusters as I describe above, you can drop
this check, right?

> +	if (i != next_free) {
> +		/*
> +		 * l_next_free_rec is broken since we find empty extent rec
> +		 * before we reach it.
> +		 */
> +		if (total_clusters == cpg) {
> +			/*
> +			 * OK, the whole extent list is sane,
> +			 * so change l_next_free_rec accordling.
> +			 */
> +			if (prompt(ost, PY, PR_DISCONTIG_BG_NEXT_FREE_REC,
> +				   "Discontig Group descriptor at block "
> +				   "%"PRIu64" has a free extent rec %u "
> +				   "while the list only have %u. Change it?",
> +				    blkno, bg->bg_list.l_next_free_rec,
> +				    i)) {

				   "Discontiguous group descriptor at block "
				   "%"PRIu64" claims to use %u extents but "
				   "only has %u filled in.  The filled in "
				   "records appear to be correct.  Set the "
				   "used extent count to the number filled in?",
				   blkno, bg->bg_list.l_next_free_rec, i)) {

> +				bg->bg_list.l_next_free_rec = i;
> +				*changed = 1;
> +			}
> +		} else {
> +			/*
> +			 * We have to drop the group now since both
> +			 * l_next_free_rec and the extent list have errors.
> +			 */
> +			if (prompt(ost, PY, PR_DISCONTIG_BG_LIST_CORRUPT,
> +				   "Discontig block group %"PRIu64" in chain "
> +				   "%u at inode %"PRIu64" have errors in "
> +				   "l_next_free_rec and the extent list. "
> +				   "Drop this group?",
> +				   blkno, bg->bg_chain,
> +				   (uint64_t)bg->bg_parent_dinode))

				   "Discontiguous group descriptor at block "
				   "%"PRIu64" claims to use %u extents but "
				   "only has %u filled in.  The filled in "
				   "records contain errors.  Drop this group?",
				   blkno, bg->bg_list.l_next_free_rec, i))

> +					*clear_ref = 1;
> +			goto out;
> +		}
> +	}
> +
> +	if (fix_pos >= 0 && fix_pos < bg->bg_list.l_count) {
> +		rec = &bg->bg_list.l_recs[fix_pos];
> +
> +		if (total_clusters == cpg) {
> +			/*
> +			 * We find a corrupted rec while the other recs
> +			 * looks fine.
> +			 */
> +			if (prompt(ost, PY, PR_DISCONTIG_BG_REMOVE_REC,
> +				   "Discontig Group descriptor at block "
> +				   "%"PRIu64" has a extent rec %d which has "
> +				   "leaf clusters %u while other extent recs "
> +				   "look sane. Remove it?",
> +				    blkno, fix_pos, rec->e_leaf_clusters)) {
> +
> +				cpy = (next_free - fix_pos - 1) * sizeof(*rec);
> +				if (cpy != 0) {
> +					memcpy(rec, rec + 1, cpy);
> +					memset(&bg->bg_list.l_recs[next_free - 1],
> +					       0, sizeof(*rec));
> +				}
> +				bg->bg_list.l_next_free_rec--;
> +				*changed = 1;
> +			}
> +		} else {
> +			if (prompt(ost, PY, PR_DISCONTIG_BG_LEAF_CLUSTERS,
> +				   "Discontig Group descriptor at block "
> +				   "%"PRIu64" has a extent rec %d which has "
> +				   "leaf clusters %u while it should has %u. "
> +				   "Change it?",
> +				    blkno, fix_pos, rec->e_leaf_clusters,
> +				    cpg - total_clusters)) {

				   "Extent record %d of discontiguous group "
				   "descriptor %"PRIu64" claims %u clusters, "
				   "but it should only have %u.  Correct it?",
				   fix_pos, blkno, rec->e_leaf_clusters,
				   cpg - total_clusters)) {

> +				rec->e_leaf_clusters = cpg - total_clusters;
> +				*changed = 1;
> +			}
> +		}
> +	}
> +out:
> +	return;
> +}
> +
>  static errcode_t repair_group_desc(o2fsck_state *ost,
>  				   struct ocfs2_dinode *di,
>  				   struct chain_state *cs,
> @@ -251,6 +418,11 @@ static errcode_t repair_group_desc(o2fsck_state *ost,
>  		changed = 1;
>  	}
>  
> +	if (ocfs2_gd_is_discontig(bg))
> +		check_discontig_bg(ost, cs->cs_cpg, bg, &changed, clear_ref);
> +	if (*clear_ref)
> +		goto out;
> +
>  	/* XXX check bg_bits vs cpg/bpc. */
>  
>  	if (changed) {
> -- 
> 1.7.1.GIT
> 

Joel

-- 

"Not being known doesn't stop the truth from being true."
        - Richard Bach

Joel Becker
Consulting Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127



More information about the Ocfs2-tools-devel mailing list