[Ocfs2-tools-devel] [PATCH 2/2] looped chain - Break loop in fsck

Wed Mar 3 18:45:02 PST 2010

Goldwyn Rodrigues wrote:
> Hi Tristan,
>
>
> On Tue, Mar 2, 2010 at 9:13 PM, tristan <tristan.ye at oracle.com> wrote:
>   
>> Hi Goldwyn,
>>
>> Comments are inlined:
>>
>> Firstly, please check the patch by /kernel-src/script/checkpatch.pl
>> your-patch to detect the coding issues at the very beginning, since I found
>> some coding style problems when apply your patch:
>>
>> Applying looped chain - Break loop in fsck
>> .dotest/patch:57: trailing whitespace.
>> ret = ocfs2_write_group_desc(ost->ost_fs,
>> .dotest/patch:90: trailing whitespace.
>> ". Break the loop?",cs->cs_chain_no,
>> .dotest/patch:94: trailing whitespace.
>> }
>> .dotest/patch:115: trailing whitespace.
>> verbosef("cl cpg %u bpc %u count %u next %u gds %u max_ch_len %u\n",
>> warning: 4 lines add whitespace errors.
>>
>>
>> As for the coding style of ocfs2 src, please refer to:
>>
>> http://www.kernel.org/doc/Documentation/CodingStyle
>>     
>
>
> I understand. will take care in the future.
>
>   
>> Goldwyn Rodrigues wrote:
>>     
>>> Detects a loop by checking hops against the theoretical limit of
>>> number of chains in a chain_rec. If a loop is found, it breaks it by
>>> storing the block numbers, and comparing with existing block numbers.
>>>
>>> Signed-off-by: Goldwyn Rodrigues <rgoldwyn at suse.de>
>>> Acked-by: Mark Fasheh <mfasheh at suse.com>
>>>
>>> --- diff --git a/fsck.ocfs2/fsck.ocfs2.checks.8.in
>>> b/fsck.ocfs2/fsck.ocfs2.checks.8.in
>>> index 05561ae..798a1e4 100644
>>> --- a/fsck.ocfs2/fsck.ocfs2.checks.8.in
>>> +++ b/fsck.ocfs2/fsck.ocfs2.checks.8.in
>>> @@ -202,6 +202,15 @@ valid in its bitmap.
>>>  Answering yes decreases the number of recorded free bits so that it
>>> equals
>>>  the total number of bits in the group descriptor's bitmap.
>>>  +.SS "CHAIN_LOOP"
>>> +A chain may loop if the next field of the group descriptor points to one
>>> of
>>> +the previous group descriptors in the chain. This causes the ocfs2 code,
>>> both
>>> +user space and kernel module to loop forever.
>>> +
>>> +Answering yes breaks the loop at an optimum location so that all the
>>> existing
>>> +group descriptors are in the chain. However, it cannot re-connect stray
>>> group
>>> +descriptors and must rely on the rest of the fsck code to fix it.
>>> +
>>>  .SS "CHAIN_COUNT"
>>>  The chain list embedded in an inode is limited by the block size and the
>>>  number of bytes consumed by the rest of the inode.  A chain list header
>>> was
>>> diff --git a/fsck.ocfs2/pass0.c b/fsck.ocfs2/pass0.c
>>> index a32bd18..1b59cc1 100644
>>> --- a/fsck.ocfs2/pass0.c
>>> +++ b/fsck.ocfs2/pass0.c
>>> @@ -455,6 +455,46 @@ out:
>>>        return ret;
>>>  }
>>>  +static errcode_t break_loop(o2fsck_state *ost, struct ocfs2_chain_rec
>>> *chain,
>>> +               unsigned int max_depth)
>>>
>>>       
>> I guess it will make more sense if your rename the func name as
>> break_chain_loop() :)
>>     
>>> +{
>>> +       uint64_t *list;
>>> +       int i;
>>> +       unsigned int depth = 0;
>>> +       uint64_t blkno = chain->c_blkno;
>>>
>>>       
>> blkno = le64_to_cpu(chain->c_blkno);
>>
>>     
>>> +       char *buf;
>>> +       struct ocfs2_group_desc *gd;
>>> +       errcode_t ret = ocfs2_malloc0(sizeof(uint64_t) * max_depth,
>>> &list);
>>>
>>>       
>> I recognized all what you do here is to locate the index where loop happens,
>> and break the loop by terminating the 'next' to NULL. I loved the way your
>> treating the loop chain, but I don't like the method you locate a the
>> position where loops happening.
>>
>>
>> You see, you used an extra array to record 'depth' of all previous nodes,
>> what's more, you also have to come back for checking each previous nodes
>> each step, that means your space complexity is O(N), and time complexity
>> here is O(N2), that's not so pleasant to me:)
>>
>>
>>     
>>> +       if (ret)
>>> +               goto out;
>>> +       ret =  ocfs2_malloc_block(ost->ost_fs->fs_io, &buf);
>>> +       if (ret)
>>> +               goto out;
>>> +       gd = (struct ocfs2_group_desc *)buf;
>>> +
>>> +       while (blkno && (depth<=max_depth)) {
>>> +               list[depth++] = blkno;
>>> +               ret = ocfs2_read_group_desc(ost->ost_fs, blkno, buf);
>>> +               if (ret)
>>> +                       goto out;
>>> +               blkno = gd->bg_next_group;
>>> +               for (i=0; i<depth; i++)
>>> +                       if (list[i]==blkno) {
>>> +                               gd->bg_next_group = 0;
>>> +                               verbosef("Breaking gd loop %"PRIu64"\n",
>>> blkno);
>>> +                               ret = ocfs2_write_group_desc(ost->ost_fs,
>>> +                                             blkno, buf);
>>> +                               goto out;
>>> +                       }
>>> +       }
>>>
>>>       
>> To detect and locate a loop in a list, you may have a better solution, use 2
>> pointers, one is 'slow', and the other is 'fast', 'slow' pointer proceed the
>> list by one hop, and 'fast' one will go two hops each step, so the essence
>> of this alogrithm is, the fast pointer will meet slow pointer somehow if
>> there is a loop exists, otherwise, fast and slow never meet, and end up with
>> NULLs respectively. the code skeleton will be as follows:
>>
>> static unit64_t get_group_desc_next(o2fsck_state *ost, uint64_t blkno, char
>> *buf)
>> {
>> uint64_t next = 0;
>> struct ocfs2_group_desc *gd;
>>
>> ret = ocfs2_read_group_desc(ost->ost_fs, blkno, buf);
>> if (ret)
>> return next;
>>
>> gd = (struct ocfs2_group_desc *)buf;
>> next = gd->gb_next_group;
>>
>> return next;
>> }
>>
>>
>> in function break_chain_loop():
>>
>> uint64_t loop_blkno, blkno, slow, fast;
>>
>> blkno = slow = fast = le64_to_cpu(chain->c_blkno);
>>
>> while (fast && get_group_desc_next(ost, fast, buf)) {
>>
>>        /*
>>         * fast ptr meet the slow means a loop happening there
>>         */
>>        if ((fast == slow) || (get_group_desc_next(ost, fast, buf) == slow))
>> {
>>                /*
>>                 * Loop happened at the node before fast and slow
>>                 */
>>                if (fast == slow)
>>                        loop_blkno = blkno;
>>
>>                /*
>>                 * Loop happend at the node of fast
>>                 */
>>                if (get_group_desc_next(ost, fast, buf) == slow)
>>                        loop_blkno = fast;
>>
>>                ret = ocfs2_read_group_desc(ost->ost_fs, loop_blkno, buf);
>>                if (ret)
>>                        goto bail;
>>
>>                gd = (struct ocfs2_group_desc *)buf;
>>
>>                gd->bg_next_group = 0;
>>
>>                verbose("Breaking gd loop %"PRIu64"\n", loop_blkno);
>>
>>                ret = ocfs2_write_group_desc(ost->ost_fs, loop_blkno, buf);
>>                if (ret)
>>                        goto bail;
>>
>>                break;
>>        } else {
>>                blkno = slow;
>>
>>                /*
>>                 * Slow ptr goes one by one hop each time.
>>                 */
>>                slow = get_group_desc_next(ost, slow, buf);
>>
>>                /*
>>                 * Fast ptr proceeds by 2 hops each step.
>>                 */
>>                fast = get_group_desc_next(ost, fast, buf);
>>                fast = get_group_desc_next(ost, fast, buf);
>>        }
>>
>>
>> }
>>
>> ...
>>
>>
>> According to above algorithm, we easily found that the space complexity is
>> 0, and time complexity was declined to O(N).
>>
>>
>>
>>     
>
>
> While this algorithm is efficient in diagnosing the loop, I think it
> fails in  detecting the optimum place to break the loop.
>
> Could you run this algorithm with the following scenario:
>
>
> cr->c_blkno = 1
>
> gd(1)->next=2
> gd(2)->next=3
> gd(3)->next=4
> gd(4)->next=2
>
> Where would the breakage be in this case?
>   

Oh, you're right, the current algorithm is not capable of detecting the 
first position where a pointing-back happens, it needs to be extented:

int looped = 0;
uint64_t blkno, slow, fast;

slow = fast = chain->c_blkno;

while (fast && get_group_desc_next(ost, fast, buf)) {
	/*
	 * fast ptr meet the slow means a loop happening there
	 */
	if (fast == slow) {
		looped = 1;
		break;
	} else {

		/*
		 * Slow ptr goes one by one hop each time.
		 */
		slow = get_group_desc_next(ost, slow, buf);

		/*
		 * Fast ptr proceeds by 2 hops each step.
		 */
		fast = get_group_desc_next(ost, fast, buf);
		fast = get_group_desc_next(ost, fast, buf);
	}
}

if (looped) {
    blkno = chain->c_blkno;
    /*
     * Find out the optimum to break the loop
     */
    while (blkno != slow) {
       blkno = get_group_desc_next(ost, blkno, buf);
       slow = get_group_desc_next(ost, blkno, buf);
    }

    ret = ocfs2_read_group_desc(ost->ost_fs, blkno, buf);
    if (ret)
       goto bail;

    gd = (struct ocfs2_group_desc *)buf;
    gd->bg_next_group = 0;
    verbose("Breaking gd loop %"PRIu64"\n", blkno);
    ret = ocfs2_write_group_desc(ost->ost_fs, blkno, buf);
    if (ret)
       goto bail;
}

Though the above algorithm looks a little bit complicated and obscure, I 
still love it, especially if you're operating with a HUGE volume(16TB+), 
and even worse, the loop happens at the end of group chain, think about 
this case. you do benefit from the space and time saved from above method.

Tristan.

Regards,
Tristan.
>
>