lk: teach crfs_readdir about dirents with multiple items

Zach Brown zach.brown at oracle.com
Tue May 13 13:56:31 PDT 2008


> Only increment f_pos after successfully calling filldir() for
> each dirent within an item. That way, we won't lose track of
> cases where an item has multiple dirents, but ran out of room
> before returning them all.

But, but, then we can return duplicate directory entries.

The fundamental problem here is that our directory entries can be
addressed by 64bit objectids and 32bit item offsets.  f_pos, when using
ancient old NFS, requires that we specify our position in a directory
with a 31bit value.  64 + 32 > 31 :P.

Rather than getting lost implementing a perfect solution at this point,
I propose that we put in a simple compromise.  We can return to this
nasty stuff later.

First, observe that struct crfs_dir_item by itself is 25 bytes.  No
given entry will ever be smaller than that.  I argue that we don't need
the low 4 bits of the byte offset into the item to uniquely specify the
next directory entry which will be copied to userspace.

Then we'll be jerks and limit the size of items to, I don't know, 1k or
2k?  This would limit the number of files which could hash to the same
value in a directory and it would limit the number of hard links which
can point to one inode.  I don't really care about either right now.

Say we chose 1k.  10 - 4 = 6 bits to specify the item offset.

Now let's say that we cap the number of inodes which we'll allocate in
the file system.  That will limit the size of the objectid that a f_pos
could point at.  31 - 6 ~= 25, limiting a CRFS file system to ~ 32
million files.

The implementation outline is, then:

1) return an error if the dir objectid is > ~25 bits.

2) start returning directory entries from the objectid found by f_pos >> 6.

3) Return an error if the size of the item is > 1k, or whatever.

4) start returning entries from the item once their starting offset is
greater than f_pos & ((1 << 6) - 1) << 4.

5) As we return each dirent, set f_pos to the object id and starting
offset of the next dirent with the low 4 bits truncated off.  f_pos =
objectid << 6 | offset >> 4;

It's lame, but it's simple and obvious, and lets us move on to other
harder problems before returning to this grunt work.

- z




More information about the crfs-devel mailing list