[Ocfs2-tools-devel] [PATCH 2/2] looped chain - Break loop in fsck

Wed Mar 3 22:39:21 PST 2010

tristan wrote:
> Goldwyn Rodrigues wrote:
>   
>> Hi Tristan,
>>
>>
>> On Tue, Mar 2, 2010 at 9:13 PM, tristan <tristan.ye at oracle.com> wrote:
>>   
>>     
>>> Hi Goldwyn,
>>>
>>> Comments are inlined:
>>>
>>> Firstly, please check the patch by /kernel-src/script/checkpatch.pl
>>> your-patch to detect the coding issues at the very beginning, since I found
>>> some coding style problems when apply your patch:
>>>
>>> Applying looped chain - Break loop in fsck
>>> .dotest/patch:57: trailing whitespace.
>>> ret = ocfs2_write_group_desc(ost->ost_fs,
>>> .dotest/patch:90: trailing whitespace.
>>> ". Break the loop?",cs->cs_chain_no,
>>> .dotest/patch:94: trailing whitespace.
>>> }
>>> .dotest/patch:115: trailing whitespace.
>>> verbosef("cl cpg %u bpc %u count %u next %u gds %u max_ch_len %u\n",
>>> warning: 4 lines add whitespace errors.
>>>
>>>
>>> As for the coding style of ocfs2 src, please refer to:
>>>
>>> http://www.kernel.org/doc/Documentation/CodingStyle
>>>     
>>>       
>> I understand. will take care in the future.
>>
>>   
>>     
>>> Goldwyn Rodrigues wrote:
>>>     
>>>       
>>>> Detects a loop by checking hops against the theoretical limit of
>>>> number of chains in a chain_rec. If a loop is found, it breaks it by
>>>> storing the block numbers, and comparing with existing block numbers.
>>>>
>>>> Signed-off-by: Goldwyn Rodrigues <rgoldwyn at suse.de>
>>>> Acked-by: Mark Fasheh <mfasheh at suse.com>
>>>>
>>>> --- diff --git a/fsck.ocfs2/fsck.ocfs2.checks.8.in
>>>> b/fsck.ocfs2/fsck.ocfs2.checks.8.in
>>>> index 05561ae..798a1e4 100644
>>>> --- a/fsck.ocfs2/fsck.ocfs2.checks.8.in
>>>> +++ b/fsck.ocfs2/fsck.ocfs2.checks.8.in
>>>> @@ -202,6 +202,15 @@ valid in its bitmap.
>>>>  Answering yes decreases the number of recorded free bits so that it
>>>> equals
>>>>  the total number of bits in the group descriptor's bitmap.
>>>>  +.SS "CHAIN_LOOP"
>>>> +A chain may loop if the next field of the group descriptor points to one
>>>> of
>>>> +the previous group descriptors in the chain. This causes the ocfs2 code,
>>>> both
>>>> +user space and kernel module to loop forever.
>>>> +
>>>> +Answering yes breaks the loop at an optimum location so that all the
>>>> existing
>>>> +group descriptors are in the chain. However, it cannot re-connect stray
>>>> group
>>>> +descriptors and must rely on the rest of the fsck code to fix it.
>>>> +
>>>>  .SS "CHAIN_COUNT"
>>>>  The chain list embedded in an inode is limited by the block size and the
>>>>  number of bytes consumed by the rest of the inode.  A chain list header
>>>> was
>>>> diff --git a/fsck.ocfs2/pass0.c b/fsck.ocfs2/pass0.c
>>>> index a32bd18..1b59cc1 100644
>>>> --- a/fsck.ocfs2/pass0.c
>>>> +++ b/fsck.ocfs2/pass0.c
>>>> @@ -455,6 +455,46 @@ out:
>>>>        return ret;
>>>>  }
>>>>  +static errcode_t break_loop(o2fsck_state *ost, struct ocfs2_chain_rec
>>>> *chain,
>>>> +               unsigned int max_depth)
>>>>
>>>>       
>>>>         
>>> I guess it will make more sense if your rename the func name as
>>> break_chain_loop() :)
>>>     
>>>       
>>>> +{
>>>> +       uint64_t *list;
>>>> +       int i;
>>>> +       unsigned int depth = 0;
>>>> +       uint64_t blkno = chain->c_blkno;
>>>>
>>>>       
>>>>         
>>> blkno = le64_to_cpu(chain->c_blkno);
>>>
>>>     
>>>       
>>>> +       char *buf;
>>>> +       struct ocfs2_group_desc *gd;
>>>> +       errcode_t ret = ocfs2_malloc0(sizeof(uint64_t) * max_depth,
>>>> &list);
>>>>
>>>>       
>>>>         
>>> I recognized all what you do here is to locate the index where loop happens,
>>> and break the loop by terminating the 'next' to NULL. I loved the way your
>>> treating the loop chain, but I don't like the method you locate a the
>>> position where loops happening.
>>>
>>>
>>> You see, you used an extra array to record 'depth' of all previous nodes,
>>> what's more, you also have to come back for checking each previous nodes
>>> each step, that means your space complexity is O(N), and time complexity
>>> here is O(N2), that's not so pleasant to me:)
>>>
>>>
>>>     
>>>       
>>>> +       if (ret)
>>>> +               goto out;
>>>> +       ret =  ocfs2_malloc_block(ost->ost_fs->fs_io, &buf);
>>>> +       if (ret)
>>>> +               goto out;
>>>> +       gd = (struct ocfs2_group_desc *)buf;
>>>> +
>>>> +       while (blkno && (depth<=max_depth)) {
>>>> +               list[depth++] = blkno;
>>>> +               ret = ocfs2_read_group_desc(ost->ost_fs, blkno, buf);
>>>> +               if (ret)
>>>> +                       goto out;
>>>> +               blkno = gd->bg_next_group;
>>>> +               for (i=0; i<depth; i++)
>>>> +                       if (list[i]==blkno) {
>>>> +                               gd->bg_next_group = 0;
>>>> +                               verbosef("Breaking gd loop %"PRIu64"\n",
>>>> blkno);
>>>> +                               ret = ocfs2_write_group_desc(ost->ost_fs,
>>>> +                                             blkno, buf);
>>>> +                               goto out;
>>>> +                       }
>>>> +       }
>>>>
>>>>       
>>>>         
>>> To detect and locate a loop in a list, you may have a better solution, use 2
>>> pointers, one is 'slow', and the other is 'fast', 'slow' pointer proceed the
>>> list by one hop, and 'fast' one will go two hops each step, so the essence
>>> of this alogrithm is, the fast pointer will meet slow pointer somehow if
>>> there is a loop exists, otherwise, fast and slow never meet, and end up with
>>> NULLs respectively. the code skeleton will be as follows:
>>>
>>> static unit64_t get_group_desc_next(o2fsck_state *ost, uint64_t blkno, char
>>> *buf)
>>> {
>>> uint64_t next = 0;
>>> struct ocfs2_group_desc *gd;
>>>
>>> ret = ocfs2_read_group_desc(ost->ost_fs, blkno, buf);
>>> if (ret)
>>> return next;
>>>
>>> gd = (struct ocfs2_group_desc *)buf;
>>> next = gd->gb_next_group;
>>>
>>> return next;
>>> }
>>>
>>>
>>> in function break_chain_loop():
>>>
>>> uint64_t loop_blkno, blkno, slow, fast;
>>>
>>> blkno = slow = fast = le64_to_cpu(chain->c_blkno);
>>>
>>> while (fast && get_group_desc_next(ost, fast, buf)) {
>>>
>>>        /*
>>>         * fast ptr meet the slow means a loop happening there
>>>         */
>>>        if ((fast == slow) || (get_group_desc_next(ost, fast, buf) == slow))
>>> {
>>>                /*
>>>                 * Loop happened at the node before fast and slow
>>>                 */
>>>                if (fast == slow)
>>>                        loop_blkno = blkno;
>>>
>>>                /*
>>>                 * Loop happend at the node of fast
>>>                 */
>>>                if (get_group_desc_next(ost, fast, buf) == slow)
>>>                        loop_blkno = fast;
>>>
>>>                ret = ocfs2_read_group_desc(ost->ost_fs, loop_blkno, buf);
>>>                if (ret)
>>>                        goto bail;
>>>
>>>                gd = (struct ocfs2_group_desc *)buf;
>>>
>>>                gd->bg_next_group = 0;
>>>
>>>                verbose("Breaking gd loop %"PRIu64"\n", loop_blkno);
>>>
>>>                ret = ocfs2_write_group_desc(ost->ost_fs, loop_blkno, buf);
>>>                if (ret)
>>>                        goto bail;
>>>
>>>                break;
>>>        } else {
>>>                blkno = slow;
>>>
>>>                /*
>>>                 * Slow ptr goes one by one hop each time.
>>>                 */
>>>                slow = get_group_desc_next(ost, slow, buf);
>>>
>>>                /*
>>>                 * Fast ptr proceeds by 2 hops each step.
>>>                 */
>>>                fast = get_group_desc_next(ost, fast, buf);
>>>                fast = get_group_desc_next(ost, fast, buf);
>>>        }
>>>
>>>
>>> }
>>>
>>> ...
>>>
>>>
>>> According to above algorithm, we easily found that the space complexity is
>>> 0, and time complexity was declined to O(N).
>>>
>>>
>>>
>>>     
>>>       
>> While this algorithm is efficient in diagnosing the loop, I think it
>> fails in  detecting the optimum place to break the loop.
>>
>> Could you run this algorithm with the following scenario:
>>
>>
>> cr->c_blkno = 1
>>
>> gd(1)->next=2
>> gd(2)->next=3
>> gd(3)->next=4
>> gd(4)->next=2
>>
>> Where would the breakage be in this case?
>>   
>>     
>
> Oh, you're right, the current algorithm is not capable of detecting the 
> first position where a pointing-back happens, it needs to be extented:
>
>
> int looped = 0;
> uint64_t blkno, slow, fast;
>
> slow = fast = chain->c_blkno;
>
> while (fast && get_group_desc_next(ost, fast, buf)) {
> 	/*
> 	 * fast ptr meet the slow means a loop happening there
> 	 */
> 	if (fast == slow) {
> 		looped = 1;
> 		break;
> 	} else {
> 		
> 		/*
> 		 * Slow ptr goes one by one hop each time.
> 		 */
> 		slow = get_group_desc_next(ost, slow, buf);
>
> 		/*
> 		 * Fast ptr proceeds by 2 hops each step.
> 		 */
> 		fast = get_group_desc_next(ost, fast, buf);
> 		fast = get_group_desc_next(ost, fast, buf);
> 	}
> }
>
> if (looped) {
>     blkno = chain->c_blkno;
>     /*
>      * Find out the optimum to break the loop
>      */
>     while (blkno != slow) {
>        blkno = get_group_desc_next(ost, blkno, buf);
>        slow = get_group_desc_next(ost, blkno, buf);
>     }
>
>     ret = ocfs2_read_group_desc(ost->ost_fs, blkno, buf);
>     if (ret)
>        goto bail;
>
>     gd = (struct ocfs2_group_desc *)buf;
>     gd->bg_next_group = 0;
>     verbose("Breaking gd loop %"PRIu64"\n", blkno);
>     ret = ocfs2_write_group_desc(ost->ost_fs, blkno, buf);
>     if (ret)
>        goto bail;
> }
>
>
> Though the above algorithm looks a little bit complicated and obscure, I 
> still love it, especially if you're operating with a HUGE volume(16TB+), 
> and even worse, the loop happens at the end of group chain, think about 
> this case. you do benefit from the space and time saved from above method.
>   

As you seen, using two iterators(slow and fast) to detect loop in a 
chain, is of much efficiency.

I'm thinking if we could add a method like ocfs2_detect_chain_loop() in 
libocfs2 to let debugfs.ocfs2
and tunefs.ocfs2 reused it, then ask user to fix the issue if it's the case.

Tristan.

>
> Tristan.
>
>
>
> Regards,
> Tristan.
>   
>>   
>>     
>
>
> _______________________________________________
> Ocfs2-tools-devel mailing list
> Ocfs2-tools-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-tools-devel
>