[Ocfs2-tools-devel] [PATCH 07/15] dx_dirs v11: more library support for directory indexing

Sun Apr 25 11:03:16 PDT 2010

On 04/23/2010 05:18 PM, Tao Ma Wrote:
> <snip>
>> +errcode_t ocfs2_new_dx_root(ocfs2_filesys *fs,
>> +                struct ocfs2_dinode *di,
>> +                uint64_t *dr_blkno)
>> +{
>> +    errcode_t ret;
>> +    char *buf = NULL;
>> +    uint64_t gd_blkno;
>> +    struct ocfs2_dx_root_block *dx_root;
>> +    int slot;
>> +
>> +    ret = ocfs2_malloc_block(fs->fs_io, &buf);
>> +    if (ret)
>> +        goto out;
>> +
>> +    slot = di->i_suballoc_slot;
>> +    ret = ocfs2_load_allocator(fs, EXTENT_ALLOC_SYSTEM_INODE,
>> +                slot, &fs->fs_eb_allocs[slot]);
> why you want to use the suballoc_slot of the inode? the i_suballoc_slot
> means the slot for inode_alloc while you load the allocators from
> extent_alloc. I pointed it out because here is a bug. For "/", it is
> allocated from global_alloc, so i_suballoc_slot is INVALID_SLOT, so
> there is no corresponding extent alloc, and &fs->fs_eb_allocs[slot] will
> overflow.

The reason is, I assum/guess, most of time, the node (machine) operating on dx_root is probably the node (machine)
operating on inode. Using the same alloc slot, may have less performance cost for future accessing if inode and dx_root
belongs to same alloc slot.

> <snip>
>> +static int dir_trailer_func(ocfs2_filesys *fs,
>> +                uint64_t blkno,
>> +                uint64_t bcount,
>> +                uint16_t ext_flags,
>> +                void *priv_data)
>> +{
>> +    struct trailer_ctxt *ctxt = (struct trailer_ctxt *)priv_data;
>> +    struct ocfs2_dinode *di = ctxt->di;
>> +    struct ocfs2_dx_root_block *dx_root = ctxt->dx_root;
>> +    struct ocfs2_dir_block_trailer *trailer;
>> +    int max_rec_len = 0;
>> +    errcode_t ret = 0;
>> +    char *blk = NULL;
>> +
>> +    ret = ocfs2_malloc_block(fs->fs_io, &blk);
>> +    if (ret)
>> +        goto out;
>> +
>> +    /* here we don't trust trailer, cannot use
>> +     * ocfs2_read_dir_block() */
>> +    ret = ocfs2_read_blocks(fs, blkno, 1, blk);
>> +    if (ret)
>> +        goto out;
>> +    ret = ocfs2_check_dir_trailer_space(fs, di, blkno, blk);
>> +    if (ret)
>> +        goto out;
>> +    ocfs2_init_dir_trailer(fs, di, blkno, blk);
>> +    max_rec_len = ocfs2_find_max_rec_len(fs, blk);
>> +    trailer = ocfs2_dir_trailer_from_block(fs, blk);
>> +    trailer->db_free_rec_len = max_rec_len;
>> +
>> +    if (max_rec_len) {
>> +        trailer->db_free_next = dx_root->dr_free_blk;
>> +        dx_root->dr_free_blk = blkno;
>> +    }
>> +
>> +    /* comput trailer->db_check here, after writes out,
>> +     * trailer is trustable */
>> +    ret = ocfs2_write_dir_block(fs, di, blkno, blk);
> I am just curious about one thing. We have added the trailer to the end
> of the dir block. Do we need to update the rec_len of the last record of
> the dir block? Or am I missing something here?

The rec_len is not updated here.
So far, the code assumes: if kernel or user space tool creates a new directory, and trailer is supported, kernel or the
user space tool should make sure the rec_len is correct.

In user space tools, only tunefs.ocfs2 operates on trailer stuffs when enable or disable trailer related features
(metaecc, indexed-dirs). tunefs_install_dir_trailer() will make sure, all valid dir entries will not overlapped with the
trailer area. If the overlap happens, tunefs_install_dir_trailer()->run_dirblocks()->fixup_dirblock() finally updates
the rec_len,

static errcode_t fixup_dirblock(ocfs2_filesys *fs,
[snip]
        /*
         * Now that we've moved any dirents out of the way, we need to
         * fix up db_last and install the trailer.
         */
        offset = ((char *)db->db_last) - db->db_buf;
[snip]

Therefore, I trust rec_len is correct.

>> +static errcode_t ocfs2_read_dx_leaves(ocfs2_filesys *fs,
>> +                uint64_t start,
>> +                int num,
>> +                struct ocfs2_dx_leaf **dx_leaves)
>> +{
>> +    errcode_t ret;
>> +    int i;
>> +    struct ocfs2_dx_leaf *dx_leaf;
>> +    for (i = 0; i < num; i++) {
>> +        assert(!dx_leaves[i]);
>> +        ret = ocfs2_malloc_block(fs->fs_io, (char **)&dx_leaf);
>> +        if (ret)
>> +            goto bail;
>> +        ret = ocfs2_read_dx_leaf(fs, start + i, (char *)dx_leaf);
>> +        if (ret)
>> +            goto bail;
>> +        dx_leaves[i] = dx_leaf;
>> +    }
>> +    goto out;
>> +
>> +bail:
>> +    for (; i >= 0; i--) {
>> +        if (dx_leaves[i])
>> +            ocfs2_free(&dx_leaves[i]);
> We need to set dx_leaves[i] to NULL here. Otherwise the callers of
> ocfs2_read_dx_leaves will double free them. At least I see the caller
> ocfs2_dx_dir_rebalance call ocfs2_dx_dir_free_leaves which free
> dx_leaves[i] again.

ocfs2_free() frees dx_leaves[i] and sets dx_leaves[i] to NULL as well. So we don't need to worry here.

Tao,

Thanks for your review, for this big one :-)

-- 
Coly Li
SuSE Labs