[Ocfs2-tools-devel] [PATCH 2/2] looped chain - Break loop in fsck

Fri Mar 5 06:44:26 PST 2010

Goldwyn Rodrigues wrote:
> Hi Tristan,
>
>
> On Thu, Mar 4, 2010 at 11:19 PM, tristan <tristan.ye at oracle.com> wrote:
>   
>> Goldwyn Rodrigues wrote:
>>     
>>> Hi Tristan
>>>
>>> On Thu, Mar 4, 2010 at 7:28 PM, tristan <tristan.ye at oracle.com> wrote:
>>>
>>>       
>>>> Goldwyn Rodrigues wrote:
>>>>
>>>>         
>>>>> Hi Tristan,
>>>>>
>>>>> On Wed, Mar 3, 2010 at 8:45 PM, tristan <tristan.ye at oracle.com> wrote:
>>>>>
>>>>>
>>>>>           
>>>>>> Goldwyn Rodrigues wrote:
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> Hi Tristan,
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Mar 2, 2010 at 9:13 PM, tristan <tristan.ye at oracle.com> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> Hi Goldwyn,
>>>>>>>>
>>>>>>>> Comments are inlined:
>>>>>>>>
>>>>>>>> Firstly, please check the patch by /kernel-src/script/checkpatch.pl
>>>>>>>> your-patch to detect the coding issues at the very beginning, since I
>>>>>>>> found
>>>>>>>> some coding style problems when apply your patch:
>>>>>>>>
>>>>>>>> Applying looped chain - Break loop in fsck
>>>>>>>> .dotest/patch:57: trailing whitespace.
>>>>>>>> ret = ocfs2_write_group_desc(ost->ost_fs,
>>>>>>>> .dotest/patch:90: trailing whitespace.
>>>>>>>> ". Break the loop?",cs->cs_chain_no,
>>>>>>>> .dotest/patch:94: trailing whitespace.
>>>>>>>> }
>>>>>>>> .dotest/patch:115: trailing whitespace.
>>>>>>>> verbosef("cl cpg %u bpc %u count %u next %u gds %u max_ch_len %u\n",
>>>>>>>> warning: 4 lines add whitespace errors.
>>>>>>>>
>>>>>>>>
>>>>>>>> As for the coding style of ocfs2 src, please refer to:
>>>>>>>>
>>>>>>>> http://www.kernel.org/doc/Documentation/CodingStyle
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>> I understand. will take care in the future.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> Goldwyn Rodrigues wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> Detects a loop by checking hops against the theoretical limit of
>>>>>>>>> number of chains in a chain_rec. If a loop is found, it breaks it by
>>>>>>>>> storing the block numbers, and comparing with existing block
>>>>>>>>> numbers.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Goldwyn Rodrigues <rgoldwyn at suse.de>
>>>>>>>>> Acked-by: Mark Fasheh <mfasheh at suse.com>
>>>>>>>>>
>>>>>>>>> --- diff --git a/fsck.ocfs2/fsck.ocfs2.checks.8.in
>>>>>>>>> b/fsck.ocfs2/fsck.ocfs2.checks.8.in
>>>>>>>>> index 05561ae..798a1e4 100644
>>>>>>>>> --- a/fsck.ocfs2/fsck.ocfs2.checks.8.in
>>>>>>>>> +++ b/fsck.ocfs2/fsck.ocfs2.checks.8.in
>>>>>>>>> @@ -202,6 +202,15 @@ valid in its bitmap.
>>>>>>>>>  Answering yes decreases the number of recorded free bits so that it
>>>>>>>>> equals
>>>>>>>>>  the total number of bits in the group descriptor's bitmap.
>>>>>>>>>  +.SS "CHAIN_LOOP"
>>>>>>>>> +A chain may loop if the next field of the group descriptor points
>>>>>>>>> to
>>>>>>>>> one
>>>>>>>>> of
>>>>>>>>> +the previous group descriptors in the chain. This causes the ocfs2
>>>>>>>>> code,
>>>>>>>>> both
>>>>>>>>> +user space and kernel module to loop forever.
>>>>>>>>> +
>>>>>>>>> +Answering yes breaks the loop at an optimum location so that all
>>>>>>>>> the
>>>>>>>>> existing
>>>>>>>>> +group descriptors are in the chain. However, it cannot re-connect
>>>>>>>>> stray
>>>>>>>>> group
>>>>>>>>> +descriptors and must rely on the rest of the fsck code to fix it.
>>>>>>>>> +
>>>>>>>>>  .SS "CHAIN_COUNT"
>>>>>>>>>  The chain list embedded in an inode is limited by the block size
>>>>>>>>> and
>>>>>>>>> the
>>>>>>>>>  number of bytes consumed by the rest of the inode.  A chain list
>>>>>>>>> header
>>>>>>>>> was
>>>>>>>>> diff --git a/fsck.ocfs2/pass0.c b/fsck.ocfs2/pass0.c
>>>>>>>>> index a32bd18..1b59cc1 100644
>>>>>>>>> --- a/fsck.ocfs2/pass0.c
>>>>>>>>> +++ b/fsck.ocfs2/pass0.c
>>>>>>>>> @@ -455,6 +455,46 @@ out:
>>>>>>>>>     return ret;
>>>>>>>>>  }
>>>>>>>>>  +static errcode_t break_loop(o2fsck_state *ost, struct
>>>>>>>>> ocfs2_chain_rec
>>>>>>>>> *chain,
>>>>>>>>> +               unsigned int max_depth)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> I guess it will make more sense if your rename the func name as
>>>>>>>> break_chain_loop() :)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> +{
>>>>>>>>> +       uint64_t *list;
>>>>>>>>> +       int i;
>>>>>>>>> +       unsigned int depth = 0;
>>>>>>>>> +       uint64_t blkno = chain->c_blkno;
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> blkno = le64_to_cpu(chain->c_blkno);
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> +       char *buf;
>>>>>>>>> +       struct ocfs2_group_desc *gd;
>>>>>>>>> +       errcode_t ret = ocfs2_malloc0(sizeof(uint64_t) * max_depth,
>>>>>>>>> &list);
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> I recognized all what you do here is to locate the index where loop
>>>>>>>> happens,
>>>>>>>> and break the loop by terminating the 'next' to NULL. I loved the way
>>>>>>>> your
>>>>>>>> treating the loop chain, but I don't like the method you locate a the
>>>>>>>> position where loops happening.
>>>>>>>>
>>>>>>>>
>>>>>>>> You see, you used an extra array to record 'depth' of all previous
>>>>>>>> nodes,
>>>>>>>> what's more, you also have to come back for checking each previous
>>>>>>>> nodes
>>>>>>>> each step, that means your space complexity is O(N), and time
>>>>>>>> complexity
>>>>>>>> here is O(N2), that's not so pleasant to me:)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> +       if (ret)
>>>>>>>>> +               goto out;
>>>>>>>>> +       ret =  ocfs2_malloc_block(ost->ost_fs->fs_io, &buf);
>>>>>>>>> +       if (ret)
>>>>>>>>> +               goto out;
>>>>>>>>> +       gd = (struct ocfs2_group_desc *)buf;
>>>>>>>>> +
>>>>>>>>> +       while (blkno && (depth<=max_depth)) {
>>>>>>>>> +               list[depth++] = blkno;
>>>>>>>>> +               ret = ocfs2_read_group_desc(ost->ost_fs, blkno,
>>>>>>>>> buf);
>>>>>>>>> +               if (ret)
>>>>>>>>> +                       goto out;
>>>>>>>>> +               blkno = gd->bg_next_group;
>>>>>>>>> +               for (i=0; i<depth; i++)
>>>>>>>>> +                       if (list[i]==blkno) {
>>>>>>>>> +                               gd->bg_next_group = 0;
>>>>>>>>> +                               verbosef("Breaking gd loop
>>>>>>>>> %"PRIu64"\n",
>>>>>>>>> blkno);
>>>>>>>>> +                               ret =
>>>>>>>>> ocfs2_write_group_desc(ost->ost_fs,
>>>>>>>>> +                                             blkno, buf);
>>>>>>>>> +                               goto out;
>>>>>>>>> +                       }
>>>>>>>>> +       }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> To detect and locate a loop in a list, you may have a better
>>>>>>>> solution,
>>>>>>>> use 2
>>>>>>>> pointers, one is 'slow', and the other is 'fast', 'slow' pointer
>>>>>>>> proceed
>>>>>>>> the
>>>>>>>> list by one hop, and 'fast' one will go two hops each step, so the
>>>>>>>> essence
>>>>>>>> of this alogrithm is, the fast pointer will meet slow pointer somehow
>>>>>>>> if
>>>>>>>> there is a loop exists, otherwise, fast and slow never meet, and end
>>>>>>>> up
>>>>>>>> with
>>>>>>>> NULLs respectively. the code skeleton will be as follows:
>>>>>>>>
>>>>>>>> static unit64_t get_group_desc_next(o2fsck_state *ost, uint64_t
>>>>>>>> blkno,
>>>>>>>> char
>>>>>>>> *buf)
>>>>>>>> {
>>>>>>>> uint64_t next = 0;
>>>>>>>> struct ocfs2_group_desc *gd;
>>>>>>>>
>>>>>>>> ret = ocfs2_read_group_desc(ost->ost_fs, blkno, buf);
>>>>>>>> if (ret)
>>>>>>>> return next;
>>>>>>>>
>>>>>>>> gd = (struct ocfs2_group_desc *)buf;
>>>>>>>> next = gd->gb_next_group;
>>>>>>>>
>>>>>>>> return next;
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> in function break_chain_loop():
>>>>>>>>
>>>>>>>> uint64_t loop_blkno, blkno, slow, fast;
>>>>>>>>
>>>>>>>> blkno = slow = fast = le64_to_cpu(chain->c_blkno);
>>>>>>>>
>>>>>>>> while (fast && get_group_desc_next(ost, fast, buf)) {
>>>>>>>>
>>>>>>>>     /*
>>>>>>>>      * fast ptr meet the slow means a loop happening there
>>>>>>>>      */
>>>>>>>>     if ((fast == slow) || (get_group_desc_next(ost, fast, buf) ==
>>>>>>>> slow))
>>>>>>>> {
>>>>>>>>             /*
>>>>>>>>              * Loop happened at the node before fast and slow
>>>>>>>>              */
>>>>>>>>             if (fast == slow)
>>>>>>>>                     loop_blkno = blkno;
>>>>>>>>
>>>>>>>>             /*
>>>>>>>>              * Loop happend at the node of fast
>>>>>>>>              */
>>>>>>>>             if (get_group_desc_next(ost, fast, buf) == slow)
>>>>>>>>                     loop_blkno = fast;
>>>>>>>>
>>>>>>>>             ret = ocfs2_read_group_desc(ost->ost_fs, loop_blkno,
>>>>>>>> buf);
>>>>>>>>             if (ret)
>>>>>>>>                     goto bail;
>>>>>>>>
>>>>>>>>             gd = (struct ocfs2_group_desc *)buf;
>>>>>>>>
>>>>>>>>             gd->bg_next_group = 0;
>>>>>>>>
>>>>>>>>             verbose("Breaking gd loop %"PRIu64"\n", loop_blkno);
>>>>>>>>
>>>>>>>>             ret = ocfs2_write_group_desc(ost->ost_fs, loop_blkno,
>>>>>>>> buf);
>>>>>>>>             if (ret)
>>>>>>>>                     goto bail;
>>>>>>>>
>>>>>>>>             break;
>>>>>>>>     } else {
>>>>>>>>             blkno = slow;
>>>>>>>>
>>>>>>>>             /*
>>>>>>>>              * Slow ptr goes one by one hop each time.
>>>>>>>>              */
>>>>>>>>             slow = get_group_desc_next(ost, slow, buf);
>>>>>>>>
>>>>>>>>             /*
>>>>>>>>              * Fast ptr proceeds by 2 hops each step.
>>>>>>>>              */
>>>>>>>>             fast = get_group_desc_next(ost, fast, buf);
>>>>>>>>             fast = get_group_desc_next(ost, fast, buf);
>>>>>>>>     }
>>>>>>>>
>>>>>>>>
>>>>>>>> }
>>>>>>>>
>>>>>>>> ...
>>>>>>>>
>>>>>>>>
>>>>>>>> According to above algorithm, we easily found that the space
>>>>>>>> complexity
>>>>>>>> is
>>>>>>>> 0, and time complexity was declined to O(N).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>> While this algorithm is efficient in diagnosing the loop, I think it
>>>>>>> fails in  detecting the optimum place to break the loop.
>>>>>>>
>>>>>>> Could you run this algorithm with the following scenario:
>>>>>>>
>>>>>>>
>>>>>>> cr->c_blkno = 1
>>>>>>>
>>>>>>> gd(1)->next=2
>>>>>>> gd(2)->next=3
>>>>>>> gd(3)->next=4
>>>>>>> gd(4)->next=2
>>>>>>>
>>>>>>> Where would the breakage be in this case?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> Oh, you're right, the current algorithm is not capable of detecting the
>>>>>> first position where a pointing-back happens, it needs to be extented:
>>>>>>
>>>>>>
>>>>>> int looped = 0;
>>>>>> uint64_t blkno, slow, fast;
>>>>>>
>>>>>> slow = fast = chain->c_blkno;
>>>>>>
>>>>>> while (fast && get_group_desc_next(ost, fast, buf)) {
>>>>>>      /*
>>>>>>       * fast ptr meet the slow means a loop happening there
>>>>>>       */
>>>>>>      if (fast == slow) {
>>>>>>              looped = 1;
>>>>>>              break;
>>>>>>      } else {
>>>>>>
>>>>>>              /*
>>>>>>               * Slow ptr goes one by one hop each time.
>>>>>>               */
>>>>>>              slow = get_group_desc_next(ost, slow, buf);
>>>>>>
>>>>>>              /*
>>>>>>               * Fast ptr proceeds by 2 hops each step.
>>>>>>               */
>>>>>>              fast = get_group_desc_next(ost, fast, buf);
>>>>>>              fast = get_group_desc_next(ost, fast, buf);
>>>>>>      }
>>>>>> }
>>>>>>
>>>>>> if (looped) {
>>>>>>  blkno = chain->c_blkno;
>>>>>>  /*
>>>>>>  * Find out the optimum to break the loop
>>>>>>  */
>>>>>>  while (blkno != slow) {
>>>>>>    blkno = get_group_desc_next(ost, blkno, buf);
>>>>>>    slow = get_group_desc_next(ost, blkno, buf);
>>>>>>  }
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>> You are creating one infinite loop to break another ;)
>>>>> If blkno!=slow before the loop, this loop will never finish, except
>>>>> the case of a 1-node loop.
>>>>> Did you run it through the case I presented yesterday?
>>>>>
>>>>>
>>>>>           
>>>> Oh, I'm afraid it would work, that's a well-known(for its space and time
>>>> efficiency) algorithm to solve the loop in a singly linked list, it does
>>>> terminate the 'while' when 'blkno == slow', and that's also the optimal
>>>> node
>>>> where the breakage was expected. for its mathematics principle, you can
>>>> Google for help;)
>>>>
>>>>
>>>>         
>>> I am not doubting the space or time efficiency, but the correctness of
>>> the algorithm.
>>> The slow pointer will always walk *exactly one step* ahead of blkno,
>>> and hence in a loop > 2 nodes, it will never be equal to blkno, and
>>> hence will never end.
>>> This is because you are calculating slow with respect to blkno.
>>>
>>> Let me know if you want me to run you through an example.
>>>
>>>       
>> Hi Goldwyn,
>>
>> I've run your case mentioned yesterday, even with more corner cases, It
>> works well by now:)
>>
>> I assumed that you have no doubt about the method to detect if a loop exists
>> in a list by using fast and slow iterators(they will eventually meet
>> sometime if loop exists). as for the algorithm to find out a optimal node
>> for breakage, we may need some mathematical proof, take the following case
>> for example:
>>
>> 0->1->2->3->4->5->6->7->8->(3), now we want to find out the first entry
>> point of the loop, it's 3 in that case! here we call the desired position as
>> A, assume the distance between start to A is x,
>>
>> Assume that the length of loop is y.
>>
>> According to fast and slow solution, we're sure that 2 pointers will finally
>> meet in the loop somewhere, at that right moment, we call the meeting
>> position as B, assume the distance between A(first entry of loop) to
>> B(meeting place) is z.
>>
>> Obviously, slow pointer has walked (x+z) steps as they firstly meet, while
>> fast pointer has walked 2*(x+z),  and it's also easy to get 2*(x+z) - (x+z)
>> = K * y, here K is an integer, means fast pointer walked k*loop_length more
>> steps than slow pointer.
>>
>> Given the fact that fast and slow have already meet there, Now we continue
>> to let slow pointer proceed by one step each time, hopefully, it will get A
>> after continuously walking x steps, that's really tricky:), since (x+z) is
>> multiple of y. In the meantime, if we reset the fast pointer to the
>> beginning of list, and let it walk also one step each time, it will also get
>> A after walking x steps.
>>
>> As a result, above mathematical analysis indicated that, the fast and slow
>> will meet again if we reset the fast pointer to the start of list and tune
>> its pace as one step as slow pointer, and fortunately, the second meeting
>> point is the right node we're looking for as a optimal node to break.
>>
>> It will greatly help if you draw all these in the paper for better
>> understanding:)
>>     
>
>
> Now you are trying to cover up your mistake ;)
>   

Yes, the previous codes has not been tested well, it's nothing realistic 
but a  concept I guess. while the latest one I pasted last time is 
workable under testing, is there still any mistakes?

> I fully understand the algorithm and the algorithm posted on the
> various internet sites are correct. For that matter, if you know the
> loop is large enough, using relatively prime number of hops will help
> you achieve the "meeting point" faster.
>
> It is *your implementation* of the algorithm I am not willing to agree with.
>   

Hi Goldwyn,

Thanks for your patient follow-up on this issue:)

I admit that previous code skeleton pasted by has mistakes, it's just a 
rough solution to illustrate my general idea against the algorithm. and 
I've not tested it in a real env. However, the last code I pasted was 
feasible under testing by your cases. Do you think it's still a 
unrealistic one?

Let me know if you have some other suggestions on it.
>
>   
>> The code skeleton works for me now as follows:
>>
>> int detect_loop(struct link_node *head, int breakage)
>> {
>>       struct link_node *fast, *slow, *prev;
>>       int looped = 0;
>>
>>       prev = fast = slow = head;
>>
>>       while (1) {
>>               if (fast && fast->next)
>>                       fast = fast->next->next;
>>               else
>>                       return looped;
>>
>>               prev = slow;
>>               slow = slow->next;
>>
>>               if (fast == slow) {
>>                       looped = 1;
>>                       break;
>>               }
>>       }
>>
>>       if (looped && breakage) {
>>
>>               fast = head;
>>               while (fast != slow) {
>>                       prev = slow;
>>                       slow = slow->next;
>>                       fast = fast->next;
>>               }
>>     
>
> You have changed (corrected) this! In analogy  to your previous
> posting, fast=slow->next, which is causing the infinite loop.
>
> For your convenince, I will paste it again:
>
>   
>>>>>>  while (blkno != slow) {
>>>>>>    blkno = get_group_desc_next(ost, blkno, buf);
>>>>>>    slow = get_group_desc_next(ost, blkno, buf);
>>>>>>  }
>>>>>>             
>
>
> I would appreciate if you would post the correct implementation for
> the good audit.
> I understand what you are trying to do and I agree the algorithm is
> much better with respect to time and space efficiency. However, if it
> does not run correctly, it is not acceptable.
>
> Please post the final patch with respect to ocfs2-tools.
>   

I can post the final patch regarding ocfs2-tools if you have no 
objection against my above algorithm which is feasible in a experimental 
env.

>
>