[Ocfs2-users] fsck hangs in Pass 0a

Josep Guerrero guerrero at ice.cat
Wed Aug 10 01:07:29 PDT 2011


Hello Matthias,

> I have a ~10TB ocfs2 filesystem in a 8-node cluster. This sits on a
> logical volume (I know lv is not cluster aware, but I make sure no one
> touches the lv, while the cluster is running). The LV consists of 5x2TB
> multipath devices.

> So I ran fsck.ocfs2 -f. But it hangs forever (>12h) with this output:
> 
> fsck.ocfs2 1.4.4
> Checking OCFS2 filesystem in /dev/mapper/lv0:
>   Label:              <NONE>
>   UUID:               F27D7B8F7127436981A2B5D1C93FB204
>   Number of blocks:   2684349440
>   Block size:         4096
>   Number of clusters: 2684349440
>   Cluster size:       4096
>   Number of slots:    16
> 
> /dev/mapper/lv0 was run with -f, check forced.
> Pass 0a: Checking cluster allocation chains

I wrote to the list about what probably was the same problem in April. You can 
access the thread here:

http://oss.oracle.com/pipermail/ocfs2-users/2011-April/005093.html

Sunil wrote a few days after, explaining there was a bug in fsck that caused 
it to enter an infinite loop when the filesystem was bigger than some value, and 
that it had been corrected in version 1.6.4 . This is an excerpt of the 
message:

> Fixed in ocfs2-tools 1.6.4. The src tarball is on oss.oracle.com.
> 
> ==================================================================
> 
> $ git name-rev --tags 2d741da9367b33f559802dfabe62d96f6adc7777
> 2d741da9367b33f559802dfabe62d96f6adc7777 tags/ocfs2-tools-1.6.3~33
> 
> ==================================================================
> commit 2d741da9367b33f559802dfabe62d96f6adc7777
> Author: Goldwyn Rodrigues <rgoldwyn at gmail.com>
> Date:   Mon Jul 26 15:19:25 2010 -0500
> 
>     fsck.ocfs2: Change local variable datatype to avoid infinite loop
>     
>     fsck on large filesystems goes in an infinite loop.
>     The problem is in verify_bitmap_descs(). i, a local variable is
>     declared as uint16_t and is compared with
>     ocfs2_cluster_group_sizes.cgs_cluster_groups which is uint32_t.
>     When cgs_cluster_groups is greater than 65535, i overflows and wraps
>     
>     creating an infinite loop of the following:
>             for (i = 0, blkno = ost->ost_fs->fs_first_cg_blkno;
>             
>                  i < cgs.cgs_cluster_groups;
>                  i++, blkno = i * ocfs2_clusters_to_blocks(ost->ost_fs,
>                  
>                                                            cgs.cgs_cpg)) {

So I downloaded the 1.6.4 version, compiled it by hand, and ran fsck on the 16 
TB filesystem. It took a long time (maybe 4 hours), but it did finish and 
corrected all the errors. Hope this helps.

Regards,

Josep Guerrero
IEEC



More information about the Ocfs2-users mailing list