[Ocfs2-tools-devel] libocfs2 allocation of clusters and fsck

Wed Oct 27 19:11:58 CDT 2004

	While thinking about how fsck needs to synchronize its idea of
used clusters with the global bitmap, I realized there was some issue
with how the library does cluster allocation.  Bear with me.
	In libe2fs, they call a function named ext2fs_read_bitmaps().
This function reads all the block group bitmaps into memory, in one
large bitmap buffer.  So, even though the block group bitmaps live at
intervals on disk, in memory they treat it as one contiguous map.  They
can then call ext2fs_new_block() to allocate a block from that map.  It
sets the bit in the map, returning the block number.  Finally,
ext2fs_write_bitmaps() stores the actual bitmap blocks back at their
appropriate location.
	There are two main places e2fsck needs to allocate a block.
First, in pass1d, they need to clone duplicate blocks.  In pass3.c, they
might need to expand the lost+found directory to handle more entries.
In both cases, they get the allocation from the fsck-generated block
map, not from the on-disk block map.  In the first case (dup cloning),
they also mark the on-disk map, which they've read in.  In the second
case (lost+found expansion) they have the on-disk map read in memory,
but they don't seem to mark it.  Either way, the fsck-generated map gets
reconciled with the on-disk one in pass5.c.
	How do they handle the fact that a normal libe2fs user would
want to use the on-disk map for block allocation, while e2fsck wants to
use its own map?  They make it explicit.  ext2fs_new_block() takes the
map you want to use as an argument.  That's pretty chummy of them.  I
don't think I like it.
	In fsck.ocfs2, Zach wants to use the library functions for
allocation as well.  He's got the same cases.  However, libocfs2 doesn't
have a cluster allocation API yet.  Here's why I'm mailing.
	He proposed to do the reconciliation of fsck-generated map and
on-disk map right at the end of pass1.  That way, all the later passes
could just trust the on-disk map.  The API would not be like libe2fs.
You wouldn't pass it a map, it would know how to look things up.
	I've been pondering how to handle such an API for a while.  With
the new chain allocator, I was figuring on having the library do exactly
what the mounted filesystem does: follow chains.  That is, only read
what is needed to do the alloc.  The alternative is to read every chain
into memory and generate a flat bitmap like ext2fs does.  Obviously, the
'flat' bitmap would actually be an ocfs2_bitmap, storing information
about where the groups are.
	Ok, I think just writing about it has given me some concrete
ideas.  I'm going to have to create an ocfs2_bitmap type to handle chain
allocators.  That would be, in essence, an entirely in-memory map.  But
that's fine, the global bitmap can't get _that_ big.  Just 512M or so.
If I then had an ocfs2_bitmap_find_diff(map1, map2) function, so that
you could do:

	bitoff = ocfs2_bitmap_find_diff(fsck_generated_map,
					on_disk_map);
	/* FIX */
	bitoff = ocfs2_bitmap_find_diff_next(fsck_generated_map,
					     on_disk_map, bitoff);

would that work?  Then, all allocation would be from the on_disk_map,
which would really be fs->cluster_bitmap.  Thoughts?

Joel

-- 

Life's Little Instruction Book #43

	"Never give up on somebody.  Miracles happen every day."

Joel Becker
Senior Member of Technical Staff
Oracle Corporation
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127