[Ocfs2-devel] [RFC][PATCH 0/4] ocfs2: Directory indexing support
Joel Becker
Joel.Becker at oracle.com
Wed Nov 12 19:59:02 PST 2008
On Wed, Nov 12, 2008 at 06:24:04PM -0800, Mark Fasheh wrote:
> The following patches implement indexed directory support in Ocfs2, mostly
> according to the design doc I wrote up a while ago:
<snip>
> Very basic ocfs2-tools patches will also follow in the next couple days or
> so. This will mostly be mkfs and debugfs support. I think it might be best
> to build the libocfs2 support on top of whatever patches we have for
> extended attributes as the tree code will have to change for that.
I agree. I hope we see those soon. We have a lot of stuff now
that isn't fully tools supported.
> Open questions:
>
> Should we just drop the signature in ocfs2_dir_block_trailer? I can't help
> but feel that it might have limited usefulness as it's not at the front of
> the block (like the rest of our signatures) and that the nature of a dirent
> block might be that we can't trust the existence of the signature to
> actually mean there's a valid ocfs2_dir_block_trailer there. The answer is
> probably still to keep the signature, but I thought I'd throw this out
> there.
I like having it, because it sticks right out in bvi/hexdump.
With any of our metadata structures, we generally have to figure out if
they are "really" a such-and-such by hand after validating the
signature. But if we start from the knowledge "block N is or is not
supposed to be of type X", the signature is a quick way to see if
something is wrong.
> Is it worth storing index (ocfs2_dx_entry) records inline inside of
> ocfs2_dx_root_block and only growing out to a tree when we exhaust the
> available space? Running the math, we could store between 18 (512 byte
> blocks) and 242 (4k blocksize) records in the space occupied by the extent
> list.
I'd say we should do _something_. Many (most?) directories have
less than 242 entries, and this saves us a sync read on any cold-cache
lookup. What about a way we could readahead the first index leaf
instead? I suppose we could store "first leaf" on the inode right next
to the dx root, and then fire of readahead for the first leaf right
before we sync-read the dx_root. If the directory fits in one index
leaf, that first leaf is already in our cache. If not, we just ignore
it. For 4k/4k, this is a single block. Then the dx_root doesn't have
to have special logic for inline-entries.
> In order to keep the code simple, I've gone with a single linked-list for
> the free dirent block search. There might be situations though, where this
> performs poorly. My plan is to version the free dirent block list so that we
> can 'upgrade' it (maybe to multiple lists) at a later point. Old versions
> would fall back to the less optimized unindexed leaf search. That way the
> upgrade would be seamless to the user.
I always liked this. Wouldn't this mean that old versions might
leave 'full' dirblocks on the free list?
Joel
--
"There are some experiences in life which should not be demanded
twice from any man, and one of them is listening to the Brahms Requiem."
- George Bernard Shaw
Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127
More information about the Ocfs2-devel
mailing list