[Ocfs2-devel] [PATCH 1/1] Ocfs2: Optimize punching-hole codes v3.

Tao Ma tao.ma at oracle.com
Tue Mar 30 01:05:07 PDT 2010


Hi Tristan,

tristan wrote:
> Thanks for your so quick review;)
> 
> 
> Tao Ma wrote:
>> Hi Tristan,
>> here comes the review.
>>
>> Tristan Ye wrote:
>>> Changes from v2 to v3:
>>>
>> <snip>
>>> + /*
>>> + * Simply jump to previous record if the pos is
>>> + * the start of a record.
>>> + */
>>> + if (le32_to_cpu(rec->e_cpos) == *pos) {
>>> + i--;
>>> + if (i < 0) {
>>> + *rec_got = NULL;
>>> + return;
>>> + }
>>> +
>>> + rec = &el->l_recs[i];
>>> + range = le32_to_cpu(rec->e_cpos) +
>>> + ocfs2_rec_clusters(el, rec);
>>> + /*
>>> + * Skip a hole.
>>> + */
>>> + if (range < *pos)
>>> + *pos = range;
>> do we still need to check range < *pos? As you have already check
>> le32_to_cpu(rec->e_cpos) == *pos.
> 
> I'm afraid we have to,
> 
> See we decreased the 'i' in the if clause;) so the rec here is not the 
> same as previous one.
yes, we have decreased. so we should have range < old_rec->e_cpos(range 
now is the previous rec's end). And *pos == old_rec->e_cpos. That is of 
course range < *pos.
> 
>>> +
>>> + break;
>>> + }
>>> + }
>>> +
>>> + *rec_got = &el->l_recs[i];
>>> +}
>>> +
>>> +/*
>>> + * Helper to calculate the punching pos and length in one run, we 
>>> handle the
>>> + * following three cases in order:
>>> + *
>>> + * - remove the entire record
>>> + * - remove a partial record
>>> + * - no record needs to be removed (hole-punching completed)
>>> +*/
>>> +static void ocfs2_calc_trunc_pos(struct inode *inode,
>>> + struct ocfs2_extent_list *el,
>>> + struct ocfs2_extent_rec *rec,
>>> + u32 trunc_start, u32 *trunc_cpos,
>>> + u32 *trunc_len, u32 *trunc_end,
>>> + u64 *blkno, int *done)
>>> +{
>>> + int ret = 0;
>>> + u32 coff, range;
>>> +
>>> + range = le32_to_cpu(rec->e_cpos) + ocfs2_rec_clusters(el, rec);
>>> +
>>> + if (le32_to_cpu(rec->e_cpos) >= trunc_start) {
>>> + *trunc_cpos = le32_to_cpu(rec->e_cpos);
>>> + *trunc_len = *trunc_end - le32_to_cpu(rec->e_cpos);
>>> + *blkno = le64_to_cpu(rec->e_blkno);
>>> + *trunc_end = le32_to_cpu(rec->e_cpos);
>>> + } else if (range > trunc_start) {
>>> + *trunc_cpos = trunc_start;
>>> + *trunc_len = range - trunc_start;
>>> + coff = trunc_start - le32_to_cpu(rec->e_cpos);
>>> + *blkno = le64_to_cpu(rec->e_blkno) +
>>> + ocfs2_clusters_to_blocks(inode->i_sb, coff);
>>> + *trunc_end = trunc_start;
>>> + } else {
>>> + /*
>>> + * It may have two following possibilities:
>>> + *
>>> + * - last record has been removed
>> what do you mean last record here? Sorry, I don't understand this case.
> 
> Here we reached the 'trunc_start', means we've got nothing to truncate;)
> 
> It's the case when 'range == trunc_start'
> 
>>> + * - trunc_start was within a hole
> 
> So here is the case when 'range < trunc_start'
oh, so do please change 'last record has been removed' somehow, I can't 
understand what it means actually.
>>> @@ -1482,16 +1588,13 @@ static int ocfs2_remove_inode_range(struct 
>>> inode *inode,
>>> }
>>>
>>> trunc_start = ocfs2_clusters_for_bytes(osb->sb, byte_start);
>>> - trunc_len = (byte_start + byte_len) >> osb->s_clustersize_bits;
>>> - if (trunc_len >= trunc_start)
>>> - trunc_len -= trunc_start;
>>> - else
>>> - trunc_len = 0;
>>> + trunc_end = (byte_start + byte_len) >> osb->s_clustersize_bits;
>>> + cluster_within_list = trunc_end;
>> You said in the comments below that "we want to find a path which 
>> contains (trunc_end - 1)". So we also need to set trunc_end -1 here? 
>> Otherwise you may enter the wrong extent block in some case?
> 
> No, I'm afraid we'd rather do like this, since 'trunc_end - 1' is used 
> for finding the next extent block when crossing the blocks, trunc_end is 
> for contiguous truncating..
why? you use cluster_with_list to search the path. So consider trunc_end 
is just the cpos for a extent block(not the leaf extent rec). You go to 
the wrong extent block actually in the first ocfs2_find_path. But you 
will find that the first rec->e_cpos == trunc_end and go to the previous 
extent block. So an extra loop here.
>>> + /*
>>> + * Unlike truncate codes, here we want to find a path which
>>> + * contains (trunc_end - 1) cpos, and then trunc_end will be
>>> + * decreased after each removal of a record range.
>>> + *
>>> + * Why not using trunc_end to search the path?
>>> + * The reason is simple, think about the situation of crossing
>>> + * the extent block, we need to find the adjacent block by
>>> + * decreasing one cluster, otherwise, it will run into a loop.
>>> + */
>>> + ret = ocfs2_find_path(INODE_CACHE(inode), path,
>>> + cluster_within_list);
>>> if (ret) {
>>> mlog_errno(ret);
>>> goto out;
>>> }
>>>
>>> - if (alloc_size > trunc_len)
>>> - alloc_size = trunc_len;
>>> + el = path_leaf_el(path);
>>>
>>> - /* Only do work for non-holes */
>>> - if (phys_cpos != 0) {
>>> - ret = ocfs2_remove_btree_range(inode, &et, cpos,
>>> - phys_cpos, alloc_size,
>>> - &dealloc, refcount_loc,
>>> - flags);
>>> - if (ret) {
>>> - mlog_errno(ret);
>>> - goto out;
>>> - }
>>> + ocfs2_find_rec_with_pos(el, &rec, &trunc_end);
>>> + /*
>>> + * Go to next extent block directly when last rec was truncated.
>>> + */
>>> + if (!rec) {
>>> + cluster_within_list = trunc_end - 1;
>>> + ocfs2_reinit_path(path, 1);
>>> + break;
>> ocfs2_find_rec_with_pos will find a rec contains trunc_end and if 
>> there is no such rec and trunc_end < el->l_recs[0].e_cpos, rec will be 
>> set NULL, right?
> 
> Correct.
> 
>> So what if the el is in the left most extent block? Your function will 
>> iterate again and again with every time trunc_end--?
> 
> Cool, you've found a bug here:
> 
> OCFS2: ERROR (device sda8): ocfs2_remove_extent: Owner 66050 has an 
> extent at cpos 1245184 which can no longer be found.
> 
> File system is now read-only due to the potential of on-disk corruption. 
> Please run fsck.ocfs2 once the file system is unmounted.
> (5975,0):ocfs2_remove_btree_range:5799 ERROR: status = -30
> (5975,0):ocfs2_remove_inode_range:1654 ERROR: status = -30
> (5975,0):__ocfs2_change_file_space:1778 ERROR: status = -30
> 
> 
> Thanks for pointing this out.
> 
> This happens when we've alreay have a hole from 0 to the first extent 
> block.
> 
> 
>> another problem is that your comments is "Go to next extent block 
>> directly when last rec was truncated", I guess you mean 'previous'? 
>> And you 'break'
> 
> I hate being a moron to make such a mistake, it definitely should be 
> 'previous' extent block.
And we need to change 'break' to 'contiue' also, right?

Regards,
Tao



More information about the Ocfs2-devel mailing list