[Ocfs2-tools-devel] [RFC 2/8] dx_dirs v4: indexed dirs core code in libocfs2

Mon Jan 25 08:26:17 PST 2010

On 2010年01月22日 16:21, Tao Ma Wrote:
> 
> 
> Coly Li wrote:
>> This patch contains the core part of current indexed dirs support in
>> libocfs2, includes,
>> - Indexed tree truncate.
>> - Build indexed tree.
>> - Insert dx record into indexed tree.
>> - Expand inlined dx_root to an extent tree.
>> - Rebalance indexed tree.
>>
>> I worry about whether I use the extent tree code correctly, though I
>> tried my best to understand the code, IMHO it's
>> still a gray area to me so far. Any review/comment is helpful !
>>
>> Signed-off-by: Coly Li <coly.li at suse.de>
>> Cc: Tao Ma <tao.ma at oracle.com>
>> ---
[snip]
>> --- a/libocfs2/inode.c
>> +++ b/libocfs2/inode.c
>> @@ -139,6 +139,10 @@ static void ocfs2_swap_inode_second(struct
>> ocfs2_dinode *di)
>>          sb->s_uuid_hash           = bswap_32(sb->s_uuid_hash);
>>          sb->s_first_cluster_group = bswap_64(sb->s_first_cluster_group);
>>          sb->s_xattr_inline_size   = bswap_16(sb->s_xattr_inline_size);
>> +        sb->s_dx_seed[0]          = bswap_32(sb->s_dx_seed[0]);
>> +        sb->s_dx_seed[1]          = bswap_32(sb->s_dx_seed[1]);
>> +        sb->s_dx_seed[2]          = bswap_32(sb->s_dx_seed[2]);
>> +        sb->s_dx_seed[3]          = bswap_32(sb->s_dx_seed[3]);
> I just noticed that in ocfs2_super_block we define
>     __le32 s_dx_seed[3];
> So swap s_dx_seed[3] is wrong?

Why do you think it's wrong ?

>> +int ocfs2_dx_dir_truncate(ocfs2_filesys *fs,
>> +            uint64_t dir)
>> +{
>> +    struct ocfs2_dx_root_block *dx_root;
>> +    char *dx_root_buf = NULL, *di_buf = NULL;
>> +    struct ocfs2_dinode *di;
>> +    int ret = 0;
>> +
>> +    ret = ocfs2_malloc_block(fs->fs_io, &di_buf);
>> +    if (ret)
>> +        goto out;
>> +    ret = ocfs2_read_inode(fs, dir, di_buf);
>> +    if (ret)
>> +        goto out;
>> +    di = (struct ocfs2_dinode *)di_buf;
>> +
>> +    /* we have to trust i_dyn_features */
>> +    if ((!ocfs2_dir_indexed(di)) ||
>> +        (di->i_dyn_features & OCFS2_INLINE_DATA_FL))
>> +        goto out;
> Do we need to check S_ISDIR or only dir inode can call this function?

So far, the caller of ocfs2_dx_dir_truncate() makes sure only dir inode is passed into the function.
Once I add the indexed dirs support in tunefs.ocfs2, I will think whether it's necessary to check inode type.

Thanks for remind me :-)

>> +static int ocfs2_find_max_rec_len(ocfs2_filesys *fs,
>> +                char *buf)
>> +{
>> +    int size, this_hole, largest_hole = 0;
>> +    char *de_buf, *limit, *start;
>> +    struct ocfs2_dir_block_trailer *trailer;
>> +    struct ocfs2_dir_entry *de;
>> +
>> +    start = buf;
>> +    trailer = ocfs2_dir_trailer_from_block(fs, buf);
>> +    size = ocfs2_dir_trailer_blk_off(fs);
>> +    limit = start + size;
>> +    de_buf = start;
>> +    de = (struct ocfs2_dir_entry *)de_buf;
>> +    do {
>> +        if (de_buf != (char *)trailer) {
> does this check redundant? if de_buf== trailer, we should already exit
> since de_buf = limit? btw, "start" is set to "buf" and never changed, I
> guess we can remove it.

Good catching. I agree with your comment, fix it in next version.

>> +static int dir_trailer_func(ocfs2_filesys *fs,
>> +                uint64_t blkno,
>> +                uint64_t bcount,
>> +                uint16_t ext_flags,
>> +                void *priv_data)
>> +{
>> +    struct trailer_ctxt *ctxt = (struct trailer_ctxt *)priv_data;
>> +    struct ocfs2_dinode *di = ctxt->di;
>> +    struct ocfs2_dx_root_block *dx_root = ctxt->dx_root;
>> +    struct ocfs2_dir_block_trailer *trailer;
>> +    int max_rec_len = 0;
>> +    errcode_t ret = 0;
>> +    char *blk = NULL;
>> +
>> +    ret = ocfs2_malloc_block(fs->fs_io, &blk);
>> +    if (ret)
>> +        goto out;
>> +
>> +    /* here we don't trust trailer, cannot use
>> +     * ocfs2_read_dir_block() */
>> +    ret = ocfs2_read_blocks(fs, blkno, 1, blk);
>> +    if (ret)
>> +        goto out;
>> +
>> +    max_rec_len = ocfs2_find_max_rec_len(fs, blk);
>> +    trailer = ocfs2_dir_trailer_from_block(fs, blk);
>> +    trailer->db_free_rec_len = max_rec_len;
>> +    ocfs2_init_dir_trailer(fs, di, blkno, blk);
> sorry for a silly qs. How could you make sure the dir_block has enough
> space for adding a trailer? What if the block is already full?

Currently, since fsck.ocfs2 only rebuild the indexed tree, which means means the dir block already have a trailer slot.

But latter when I add indexed dir support to tunefs.ocfs2, this is a problem. Thanks for point out this, I will find a
solution latter :-)

>> +
>> +    if (max_rec_len) {
>> +        trailer->db_free_next = dx_root->dr_free_blk;
>> +        dx_root->dr_free_blk = blkno;
>> +    }
>> +
>> +    /* comput trailer->db_check here, after writes out,
>> +     * trailer is trustable */
>> +    ret = ocfs2_write_dir_block(fs, di, blkno, blk);
>> +out:
>> +    if (blk)
>> +        ocfs2_free(&blk);
>> +    return ret;
>> +}
>> +
>> +static errcode_t ocfs2_init_dir_trailers(ocfs2_filesys *fs,
>> +                struct ocfs2_dinode *di,
>> +                struct ocfs2_dx_root_block *dx_root)
>> +{
>> +    errcode_t ret = 0;
>> +    struct trailer_ctxt ctxt;
>> +
>> +    if (di->i_dyn_features & OCFS2_INLINE_DATA_FL) {
>> +        ret = OCFS2_ET_INODE_NOT_VALID;
>> +        goto out;
> why not return 0 here since actually there is no error?

IMHO, a inline directory should not have trailer. That's why I return an error here.

>> +inline static int ocfs2_inline_dx_has_space(struct
>> ocfs2_dx_root_block *dx_root)
>> +{
>> +    struct ocfs2_dx_entry_list *entry_list;
>> +
>> +    entry_list = &dx_root->dr_entries;
>> +
>> +    if (entry_list->de_num_used >= entry_list->de_count)
>> +        return OCFS2_ET_DIR_NO_SPACE;
> hey, the below you check ocfs2_inline_dx_has_space with 0, here neither
> of the 2 returns have the value 0.

Yeah, I found out this too when I reviewed the code. The fix will be in next post.
Good catching ;-)

>> +static struct ocfs2_dx_leaf **ocfs2_dx_dir_alloc_leaves(ocfs2_filesys
>> *fs,
>> +                    int *ret_num_leaves)
>> +{
>> +    int ret, num_dx_leaves = ocfs2_clusters_to_blocks(fs, 1);
>> +    char *dx_leaves_buf = NULL;
>> +
>> +
>> +    ret = ocfs2_malloc0(num_dx_leaves * sizeof(void *), 
>> &dx_leaves_buf);
>> +    if (dx_leaves_buf && ret_num_leaves)
>> +        *ret_num_leaves = num_dx_leaves;
>> +
>> +    return (struct ocfs2_dx_leaf **)dx_leaves_buf;
> &dx_leaves_buf?

Oops, this is a big fault. Thanks for point out this.

> 
> I just wonder whether you have written a test case for your patch? These
> errors in the patch should be pretty easy to find out.

Hi Tao,

Up to this version, all the code is untested. In order to get feedback earlier, I posted out the patches when they
passed compiling.

These days, I tested the code, added all comments/fixes from you and Joel, and fixed some bugs found in the testing. The
next post, will be a testable version.

>From your feedback, it seems no concern from the extent related code (and it works in my testing). Thanks for your
comments, the code looks better now :-)

-- 
Coly Li
SuSE Labs