[Ocfs2-tools-devel] [PATCH 2/2] looped chain - Break loop in fsck
Goldwyn Rodrigues
rgoldwyn at gmail.com
Thu Mar 4 06:25:55 PST 2010
Hi Tristan,
On Wed, Mar 3, 2010 at 8:45 PM, tristan <tristan.ye at oracle.com> wrote:
> Goldwyn Rodrigues wrote:
>>
>> Hi Tristan,
>>
>>
>> On Tue, Mar 2, 2010 at 9:13 PM, tristan <tristan.ye at oracle.com> wrote:
>>
>>>
>>> Hi Goldwyn,
>>>
>>> Comments are inlined:
>>>
>>> Firstly, please check the patch by /kernel-src/script/checkpatch.pl
>>> your-patch to detect the coding issues at the very beginning, since I
>>> found
>>> some coding style problems when apply your patch:
>>>
>>> Applying looped chain - Break loop in fsck
>>> .dotest/patch:57: trailing whitespace.
>>> ret = ocfs2_write_group_desc(ost->ost_fs,
>>> .dotest/patch:90: trailing whitespace.
>>> ". Break the loop?",cs->cs_chain_no,
>>> .dotest/patch:94: trailing whitespace.
>>> }
>>> .dotest/patch:115: trailing whitespace.
>>> verbosef("cl cpg %u bpc %u count %u next %u gds %u max_ch_len %u\n",
>>> warning: 4 lines add whitespace errors.
>>>
>>>
>>> As for the coding style of ocfs2 src, please refer to:
>>>
>>> http://www.kernel.org/doc/Documentation/CodingStyle
>>>
>>
>>
>> I understand. will take care in the future.
>>
>>
>>>
>>> Goldwyn Rodrigues wrote:
>>>
>>>>
>>>> Detects a loop by checking hops against the theoretical limit of
>>>> number of chains in a chain_rec. If a loop is found, it breaks it by
>>>> storing the block numbers, and comparing with existing block numbers.
>>>>
>>>> Signed-off-by: Goldwyn Rodrigues <rgoldwyn at suse.de>
>>>> Acked-by: Mark Fasheh <mfasheh at suse.com>
>>>>
>>>> --- diff --git a/fsck.ocfs2/fsck.ocfs2.checks.8.in
>>>> b/fsck.ocfs2/fsck.ocfs2.checks.8.in
>>>> index 05561ae..798a1e4 100644
>>>> --- a/fsck.ocfs2/fsck.ocfs2.checks.8.in
>>>> +++ b/fsck.ocfs2/fsck.ocfs2.checks.8.in
>>>> @@ -202,6 +202,15 @@ valid in its bitmap.
>>>> Answering yes decreases the number of recorded free bits so that it
>>>> equals
>>>> the total number of bits in the group descriptor's bitmap.
>>>> +.SS "CHAIN_LOOP"
>>>> +A chain may loop if the next field of the group descriptor points to
>>>> one
>>>> of
>>>> +the previous group descriptors in the chain. This causes the ocfs2
>>>> code,
>>>> both
>>>> +user space and kernel module to loop forever.
>>>> +
>>>> +Answering yes breaks the loop at an optimum location so that all the
>>>> existing
>>>> +group descriptors are in the chain. However, it cannot re-connect stray
>>>> group
>>>> +descriptors and must rely on the rest of the fsck code to fix it.
>>>> +
>>>> .SS "CHAIN_COUNT"
>>>> The chain list embedded in an inode is limited by the block size and
>>>> the
>>>> number of bytes consumed by the rest of the inode. A chain list header
>>>> was
>>>> diff --git a/fsck.ocfs2/pass0.c b/fsck.ocfs2/pass0.c
>>>> index a32bd18..1b59cc1 100644
>>>> --- a/fsck.ocfs2/pass0.c
>>>> +++ b/fsck.ocfs2/pass0.c
>>>> @@ -455,6 +455,46 @@ out:
>>>> return ret;
>>>> }
>>>> +static errcode_t break_loop(o2fsck_state *ost, struct ocfs2_chain_rec
>>>> *chain,
>>>> + unsigned int max_depth)
>>>>
>>>>
>>>
>>> I guess it will make more sense if your rename the func name as
>>> break_chain_loop() :)
>>>
>>>>
>>>> +{
>>>> + uint64_t *list;
>>>> + int i;
>>>> + unsigned int depth = 0;
>>>> + uint64_t blkno = chain->c_blkno;
>>>>
>>>>
>>>
>>> blkno = le64_to_cpu(chain->c_blkno);
>>>
>>>
>>>>
>>>> + char *buf;
>>>> + struct ocfs2_group_desc *gd;
>>>> + errcode_t ret = ocfs2_malloc0(sizeof(uint64_t) * max_depth,
>>>> &list);
>>>>
>>>>
>>>
>>> I recognized all what you do here is to locate the index where loop
>>> happens,
>>> and break the loop by terminating the 'next' to NULL. I loved the way
>>> your
>>> treating the loop chain, but I don't like the method you locate a the
>>> position where loops happening.
>>>
>>>
>>> You see, you used an extra array to record 'depth' of all previous nodes,
>>> what's more, you also have to come back for checking each previous nodes
>>> each step, that means your space complexity is O(N), and time complexity
>>> here is O(N2), that's not so pleasant to me:)
>>>
>>>
>>>
>>>>
>>>> + if (ret)
>>>> + goto out;
>>>> + ret = ocfs2_malloc_block(ost->ost_fs->fs_io, &buf);
>>>> + if (ret)
>>>> + goto out;
>>>> + gd = (struct ocfs2_group_desc *)buf;
>>>> +
>>>> + while (blkno && (depth<=max_depth)) {
>>>> + list[depth++] = blkno;
>>>> + ret = ocfs2_read_group_desc(ost->ost_fs, blkno, buf);
>>>> + if (ret)
>>>> + goto out;
>>>> + blkno = gd->bg_next_group;
>>>> + for (i=0; i<depth; i++)
>>>> + if (list[i]==blkno) {
>>>> + gd->bg_next_group = 0;
>>>> + verbosef("Breaking gd loop %"PRIu64"\n",
>>>> blkno);
>>>> + ret =
>>>> ocfs2_write_group_desc(ost->ost_fs,
>>>> + blkno, buf);
>>>> + goto out;
>>>> + }
>>>> + }
>>>>
>>>>
>>>
>>> To detect and locate a loop in a list, you may have a better solution,
>>> use 2
>>> pointers, one is 'slow', and the other is 'fast', 'slow' pointer proceed
>>> the
>>> list by one hop, and 'fast' one will go two hops each step, so the
>>> essence
>>> of this alogrithm is, the fast pointer will meet slow pointer somehow if
>>> there is a loop exists, otherwise, fast and slow never meet, and end up
>>> with
>>> NULLs respectively. the code skeleton will be as follows:
>>>
>>> static unit64_t get_group_desc_next(o2fsck_state *ost, uint64_t blkno,
>>> char
>>> *buf)
>>> {
>>> uint64_t next = 0;
>>> struct ocfs2_group_desc *gd;
>>>
>>> ret = ocfs2_read_group_desc(ost->ost_fs, blkno, buf);
>>> if (ret)
>>> return next;
>>>
>>> gd = (struct ocfs2_group_desc *)buf;
>>> next = gd->gb_next_group;
>>>
>>> return next;
>>> }
>>>
>>>
>>> in function break_chain_loop():
>>>
>>> uint64_t loop_blkno, blkno, slow, fast;
>>>
>>> blkno = slow = fast = le64_to_cpu(chain->c_blkno);
>>>
>>> while (fast && get_group_desc_next(ost, fast, buf)) {
>>>
>>> /*
>>> * fast ptr meet the slow means a loop happening there
>>> */
>>> if ((fast == slow) || (get_group_desc_next(ost, fast, buf) ==
>>> slow))
>>> {
>>> /*
>>> * Loop happened at the node before fast and slow
>>> */
>>> if (fast == slow)
>>> loop_blkno = blkno;
>>>
>>> /*
>>> * Loop happend at the node of fast
>>> */
>>> if (get_group_desc_next(ost, fast, buf) == slow)
>>> loop_blkno = fast;
>>>
>>> ret = ocfs2_read_group_desc(ost->ost_fs, loop_blkno, buf);
>>> if (ret)
>>> goto bail;
>>>
>>> gd = (struct ocfs2_group_desc *)buf;
>>>
>>> gd->bg_next_group = 0;
>>>
>>> verbose("Breaking gd loop %"PRIu64"\n", loop_blkno);
>>>
>>> ret = ocfs2_write_group_desc(ost->ost_fs, loop_blkno, buf);
>>> if (ret)
>>> goto bail;
>>>
>>> break;
>>> } else {
>>> blkno = slow;
>>>
>>> /*
>>> * Slow ptr goes one by one hop each time.
>>> */
>>> slow = get_group_desc_next(ost, slow, buf);
>>>
>>> /*
>>> * Fast ptr proceeds by 2 hops each step.
>>> */
>>> fast = get_group_desc_next(ost, fast, buf);
>>> fast = get_group_desc_next(ost, fast, buf);
>>> }
>>>
>>>
>>> }
>>>
>>> ...
>>>
>>>
>>> According to above algorithm, we easily found that the space complexity
>>> is
>>> 0, and time complexity was declined to O(N).
>>>
>>>
>>>
>>>
>>
>>
>> While this algorithm is efficient in diagnosing the loop, I think it
>> fails in detecting the optimum place to break the loop.
>>
>> Could you run this algorithm with the following scenario:
>>
>>
>> cr->c_blkno = 1
>>
>> gd(1)->next=2
>> gd(2)->next=3
>> gd(3)->next=4
>> gd(4)->next=2
>>
>> Where would the breakage be in this case?
>>
>
> Oh, you're right, the current algorithm is not capable of detecting the
> first position where a pointing-back happens, it needs to be extented:
>
>
> int looped = 0;
> uint64_t blkno, slow, fast;
>
> slow = fast = chain->c_blkno;
>
> while (fast && get_group_desc_next(ost, fast, buf)) {
> /*
> * fast ptr meet the slow means a loop happening there
> */
> if (fast == slow) {
> looped = 1;
> break;
> } else {
>
> /*
> * Slow ptr goes one by one hop each time.
> */
> slow = get_group_desc_next(ost, slow, buf);
>
> /*
> * Fast ptr proceeds by 2 hops each step.
> */
> fast = get_group_desc_next(ost, fast, buf);
> fast = get_group_desc_next(ost, fast, buf);
> }
> }
>
> if (looped) {
> blkno = chain->c_blkno;
> /*
> * Find out the optimum to break the loop
> */
> while (blkno != slow) {
> blkno = get_group_desc_next(ost, blkno, buf);
> slow = get_group_desc_next(ost, blkno, buf);
> }
>
You are creating one infinite loop to break another ;)
If blkno!=slow before the loop, this loop will never finish, except
the case of a 1-node loop.
Did you run it through the case I presented yesterday?
> ret = ocfs2_read_group_desc(ost->ost_fs, blkno, buf);
> if (ret)
> goto bail;
>
> gd = (struct ocfs2_group_desc *)buf;
> gd->bg_next_group = 0;
> verbose("Breaking gd loop %"PRIu64"\n", blkno);
> ret = ocfs2_write_group_desc(ost->ost_fs, blkno, buf);
> if (ret)
> goto bail;
> }
>
>
> Though the above algorithm looks a little bit complicated and obscure, I
> still love it, especially if you're operating with a HUGE volume(16TB+), and
> even worse, the loop happens at the end of group chain, think about this
> case. you do benefit from the space and time saved from above method.
>
Regards,
--
Goldwyn
More information about the Ocfs2-tools-devel
mailing list