[Ocfs2-users] fsck.ocfs2 loops + hangs but does not check

Michael Ulbrich mul at rentapacs.de
Thu Mar 24 03:38:40 PDT 2016


Hi Joseph,

ok, got it! Here's the loop in chain 73:

Group Chain: 73   Parent Inode: 13  Generation: 1172963971
CRC32: 00000000   ECC: 0000
##   Block#            Total    Used     Free     Contig   Size
0    4280773632        15872    11487    4385     1774     1984
1    2583263232        15872    5341     10531    5153     1984
2    4543613952        15872    5329     10543    5119     1984
3    4532662272        15872    10753    5119     5119     1984
4    4539963392        15872    3223     12649    7530     1984
5    4536312832        15872    5219     10653    5534     1984
6    4529011712        15872    6047     9825     3359     1984
7    4525361152        15872    4475     11397    5809     1984
8    4521710592        15872    3182     12690    5844     1984
9    4518060032        15872    5881     9991     5131     1984
10   4236966912        15872    10753    5119     5119     1984
11   4098245632        15872    10756    5116     3388     1984
12   4514409472        15872    8826     7046     5119     1984
13   3441144832        15872    15       15857    9680     1984
14   4404892672        15872    7563     8309     5119     1984
15   4233316352        15872    9398     6474     5114     1984
16   4488855552        15872    6358     9514     5119     1984
17   3901115392        15872    9932     5940     3757     1984
18   4507108352        15872    6557     9315     6166     1984
19   4083643392        15872    571      15301    4914     1984 <--
20   4510758912        15872    4834     11038    6601     1984
21   4492506112        15872    6532     9340     5119     1984
22   4496156672        15872    10753    5119     5119     1984
23   4503457792        15872    10718    5154     5119     1984
...
154   4083643392        15872    571      15301    4914     1984 <--
155   4510758912        15872    4834     11038    6601     1984
156   4492506112        15872    6532     9340     5119     1984
157   4496156672        15872    10753    5119     5119     1984
158   4503457792        15872    10718    5154     5119     1984
...
289   4083643392        15872    571      15301    4914     1984 <--
290   4510758912        15872    4834     11038    6601     1984
291   4492506112        15872    6532     9340     5119     1984
292   4496156672        15872    10753    5119     5119     1984
293   4503457792        15872    10718    5154     5119     1984

etc.

So the loop begins at record #154 and spans 135 records, right?

Will backup fs metadata as soon as I have some external storage at hand.

Thanks a lot so far ... Michael

On 03/24/2016 10:41 AM, Joseph Qi wrote:
> Hi Michael,
> It seems that dead loop happens in chain 73. You have formatted using 2K
> block and 4K cluster, so each chain should have 1522 or 1521 records.
> But at first glance, I cannot figure out which block goes wrong, because
> the output you pasted indicates all blocks are different. So I suggest
> you investigate the all blocks which belong to chain 73 and try to find
> out if there is a loop there.
> BTW, have you backed up the metadata using o2image?
> 
> Thanks,
> Joseph
> 
> On 2016/3/24 16:40, Michael Ulbrich wrote:
>> Hi Joseph,
>>
>> thanks a lot for your help. It is very much appreciated!
>>
>> I ran debugsfs.ocfs2 from ocfs2-tools 1.6.4 on the mounted file system:
>>
>> root at s1a:~# debugfs.ocfs2 -R 'stat //global_bitmap' /dev/drbd1 >
>> debugfs_drbd1.log 2>&1
>>
>> Inode: 13   Mode: 0644   Generation: 1172963971 (0x45ea0283)
>> FS Generation: 1172963971 (0x45ea0283)
>> CRC32: 00000000   ECC: 0000
>> Type: Regular   Attr: 0x0   Flags: Valid System Allocbitmap Chain
>> Dynamic Features: (0x0)
>> User: 0 (root)   Group: 0 (root)   Size: 11381315956736
>> Links: 1   Clusters: 2778641591
>> ctime: 0x54010183 -- Sat Aug 30 00:41:07 2014
>> atime: 0x54010183 -- Sat Aug 30 00:41:07 2014
>> mtime: 0x54010183 -- Sat Aug 30 00:41:07 2014
>> dtime: 0x0 -- Thu Jan  1 01:00:00 1970
>> ctime_nsec: 0x00000000 -- 0
>> atime_nsec: 0x00000000 -- 0
>> mtime_nsec: 0x00000000 -- 0
>> Refcount Block: 0
>> Last Extblk: 0   Orphan Slot: 0
>> Sub Alloc Slot: Global   Sub Alloc Bit: 7
>> Bitmap Total: 2778641591   Used: 1083108631   Free: 1695532960
>> Clusters per Group: 15872   Bits per Cluster: 1
>> Count: 115   Next Free Rec: 115
>> ##   Total        Used         Free         Block#
>> 0    24173056     9429318      14743738     4533995520
>> 1    24173056     9421663      14751393     4548629504
>> 2    24173056     9432421      14740635     4588817408
>> 3    24173056     9427533      14745523     4548692992
>> 4    24173056     9433978      14739078     4508568576
>> 5    24173056     9436974      14736082     4636369920
>> 6    24173056     9428411      14744645     4563390464
>> 7    24173056     9426950      14746106     4479459328
>> 8    24173056     9428099      14744957     4548851712
>> 9    24173056     9431794      14741262     4585389056
>> ...
>> 105   24157184     9414241      14742943     4690652160
>> 106   24157184     9419715      14737469     4467999744
>> 107   24157184     9411479      14745705     4431525888
>> 108   24157184     9413235      14743949     4559327232
>> 109   24157184     9417948      14739236     4500950016
>> 110   24157184     9411013      14746171     4566691840
>> 111   24157184     9421252      14735932     4522916864
>> 112   24157184     9416726      14740458     4537550848
>> 113   24157184     9415358      14741826     4676303872
>> 114   24157184     9420448      14736736     4526662656
>>
>> Group Chain: 0   Parent Inode: 13  Generation: 1172963971
>> CRC32: 00000000   ECC: 0000
>> ##   Block#            Total    Used     Free     Contig   Size
>> 0    4533995520        15872    6339     9533     3987     1984
>> 1    4530344960        15872    10755    5117     5117     1984
>> 2    2997109760        15872    10753    5119     5119     1984
>> 3    4526694400        15872    10753    5119     5119     1984
>> 4    3022663680        15872    10753    5119     5119     1984
>> 5    4512092160        15872    9043     6829     2742     1984
>> 6    4523043840        15872    4948     10924    9612     1984
>> 7    4519393280        15872    6150     9722     5595     1984
>> 8    4515742720        15872    4323     11549    6603     1984
>> 9    3771028480        15872    10753    5119     5119     1984
>> ...
>> 1513   5523297280        15872    1        15871    15871    1984
>> 1514   5526947840        15872    1        15871    15871    1984
>> 1515   5530598400        15872    1        15871    15871    1984
>> 1516   5534248960        15872    1        15871    15871    1984
>> 1517   5537899520        15872    1        15871    15871    1984
>> 1518   5541550080        15872    1        15871    15871    1984
>> 1519   5545200640        15872    1        15871    15871    1984
>> 1520   5548851200        15872    1        15871    15871    1984
>> 1521   5552501760        15872    1        15871    15871    1984
>> 1522   5556152320        15872    1        15871    15871    1984
>>
>> Group Chain: 1   Parent Inode: 13  Generation: 1172963971
>> CRC32: 00000000   ECC: 0000
>> ##   Block#            Total    Used     Free     Contig   Size
>> 0    4548629504        15872    10755    5117     2496     1984
>> 1    2993490944        15872    59       15813    14451    1984
>> 2    2489713664        15872    10758    5114     3726     1984
>> 3    3117609984        15872    3958     11914    6165     1984
>> 4    2544472064        15872    10753    5119     5119     1984
>> 5    3040948224        15872    10753    5119     5119     1984
>> 6    2971587584        15872    10753    5119     5119     1984
>> 7    4493871104        15872    8664     7208     3705     1984
>> 8    4544978944        15872    8711     7161     2919     1984
>> 9    4417209344        15872    3253     12619    6447     1984
>> ...
>> 1513   5523329024        15872    1        15871    15871    1984
>> 1514   5526979584        15872    1        15871    15871    1984
>> 1515   5530630144        15872    1        15871    15871    1984
>> 1516   5534280704        15872    1        15871    15871    1984
>> 1517   5537931264        15872    1        15871    15871    1984
>> 1518   5541581824        15872    1        15871    15871    1984
>> 1519   5545232384        15872    1        15871    15871    1984
>> 1520   5548882944        15872    1        15871    15871    1984
>> 1521   5552533504        15872    1        15871    15871    1984
>> 1522   5556184064        15872    1        15871    15871    1984
>>
>> ... all following group chains are similarly structured up to #73 which
>> looks as follows:
>>
>> Group Chain: 73   Parent Inode: 13  Generation: 1172963971
>> CRC32: 00000000   ECC: 0000
>> ##   Block#            Total    Used     Free     Contig   Size
>> 0    2583263232        15872    5341     10531    5153     1984
>> 1    4543613952        15872    5329     10543    5119     1984
>> 2    4532662272        15872    10753    5119     5119     1984
>> 3    4539963392        15872    3223     12649    7530     1984
>> 4    4536312832        15872    5219     10653    5534     1984
>> 5    4529011712        15872    6047     9825     3359     1984
>> 6    4525361152        15872    4475     11397    5809     1984
>> 7    4521710592        15872    3182     12690    5844     1984
>> 8    4518060032        15872    5881     9991     5131     1984
>> 9    4236966912        15872    10753    5119     5119     1984
>> ...
>> 2059651   4299026432        15872    4334     11538    4816     1984
>> 2059652   4087293952        15872    7003     8869     2166     1984
>> 2059653   4295375872        15872    6626     9246     5119     1984
>> 2059654   4288074752        15872    509      15363    9662     1984
>> 2059655   4291725312        15872    6151     9721     5119     1984
>> 2059656   4284424192        15872    10052    5820     5119     1984
>> 2059657   4277123072        15872    7383     8489     5120     1984
>> 2059658   4273472512        15872    14       15858    5655     1984
>> 2059659   4269821952        15872    2637     13235    7060     1984
>> 2059660   4266171392        15872    10758    5114     3674     1984
>> ...
>>
>> Assuming this would go on forever I stopped debugfs.ocfs2.
>>
>> With debugs.ocfs2 from ocfs2-tools 1.8.4 I get an identical result.
>>
>> Please let me know if I can provide any further information and help to
>> fix this issue.
>>
>> Thanks again + Best regards ... Michael
>>
>> On 03/24/2016 01:30 AM, Joseph Qi wrote:
>>> Hi Michael,
>>> Could you please use debugfs to check the output?
>>> # debugfs.ocfs2 -R 'stat //global_bitmap' <device>
>>>
>>> Thanks,
>>> Joseph
>>>
>>> On 2016/3/24 6:38, Michael Ulbrich wrote:
>>>> Hi ocfs2-users,
>>>>
>>>> my first post to this list from yesterday probably didn't get through.
>>>>
>>>> Anyway, I've made some progress in the meantime and may now ask more
>>>> specific questions ...
>>>>
>>>> I'm having issues with an 11 TB ocfs2 shared filesystem on Debian Wheezy:
>>>>
>>>> Linux s1a 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2 x86_64 GNU/Linux
>>>>
>>>> the kernel modules are:
>>>>
>>>> modinfo ocfs2 -> version: 1.5.0
>>>>
>>>> using stock ocfs2-tools 1.6.4-1+deb7u1 from the distri.
>>>>
>>>> As an alternative I cloned and built the latest ocfs2-tools from
>>>> markfasheh's ocfs2-tools on github which should be version 1.8.4.
>>>>
>>>> The filesystem runs on top of drbd, is used to roughly 40 % and suffers
>>>> from read-only remounts and hanging clients since the last reboot. This
>>>> may be DLM problems but I suspect they stem from some corrupt disk
>>>> structures. Before that it all ran stable for months.
>>>>
>>>> This situation made me want to run fsck.ocfs2 and now I wonder how to do
>>>> that. The filesystem is not mounted.
>>>>
>>>> With the stock ocfs-tools 1.6.4:
>>>>
>>>> root at s1a:~# fsck.ocfs2 -v -f /dev/drbd1 > fsck_drbd1.log 2>&1
>>>> fsck.ocfs2 1.6.4
>>>> Checking OCFS2 filesystem in /dev/drbd1:
>>>>   Label:              ocfs2_ASSET
>>>>   UUID:               6A1A0189A3F94E32B6B9A526DF9060F3
>>>>   Number of blocks:   5557283182
>>>>   Block size:         2048
>>>>   Number of clusters: 2778641591
>>>>   Cluster size:       4096
>>>>   Number of slots:    16
>>>>
>>>> I'm checking fsck_drbd1.log and find that it is making progress in
>>>>
>>>> Pass 0a: Checking cluster allocation chains
>>>>
>>>> until it reaches "chain 73" and goes into an infinite loop filling the
>>>> logfile with breathtaking speed.
>>>>
>>>> With the newly built ocfs-tools 1.8.4 I get:
>>>>
>>>> root at s1a:~# fsck.ocfs2 -v -f /dev/drbd1 > fsck_drbd1.log 2>&1
>>>> fsck.ocfs2 1.8.4
>>>> Checking OCFS2 filesystem in /dev/drbd1:
>>>>   Label:              ocfs2_ASSET
>>>>   UUID:               6A1A0189A3F94E32B6B9A526DF9060F3
>>>>   Number of blocks:   5557283182
>>>>   Block size:         2048
>>>>   Number of clusters: 2778641591
>>>>   Cluster size:       4096
>>>>   Number of slots:    16
>>>>
>>>> Again watching the verbose output in fsck_drbd1.log I find that this
>>>> time it proceeds up to
>>>>
>>>> Pass 0a: Checking cluster allocation chains
>>>> o2fsck_pass0:1360 | found inode alloc 13 at block 13
>>>>
>>>> and stays there without any further progress. I've terminated this
>>>> process after waiting for more than an hour.
>>>>
>>>> Now - I'm lost somehow ... and would very much appreciate if anybody on
>>>> this list would share his knowledge and give me a hint what to do next.
>>>>
>>>> What could be done to get this file system checked and repaired? Am I
>>>> missing something important or do I just have to wait a little bit
>>>> longer? Is there a version of ocfs2-tools / fsck.ocfs2 which will
>>>> perform as expected?
>>>>
>>>> I'm prepared to upgrade the kernel to 3.16.0-0.bpo.4-amd64 but shy away
>>>> from taking that risk without any clue of whether that might solve my
>>>> problem ...
>>>>
>>>> Thanks in advance ... Michael Ulbrich
>>>>
>>>> _______________________________________________
>>>> Ocfs2-users mailing list
>>>> Ocfs2-users at oss.oracle.com
>>>> https://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>
>>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Ocfs2-users mailing list
>>> Ocfs2-users at oss.oracle.com
>>> https://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
>> .
>>
> 
> 



More information about the Ocfs2-users mailing list