[Ocfs2-users] fsck hangs in Pass 0a
Josep Guerrero
guerrero at ice.cat
Wed Aug 10 01:07:29 PDT 2011
Hello Matthias,
> I have a ~10TB ocfs2 filesystem in a 8-node cluster. This sits on a
> logical volume (I know lv is not cluster aware, but I make sure no one
> touches the lv, while the cluster is running). The LV consists of 5x2TB
> multipath devices.
> So I ran fsck.ocfs2 -f. But it hangs forever (>12h) with this output:
>
> fsck.ocfs2 1.4.4
> Checking OCFS2 filesystem in /dev/mapper/lv0:
> Label: <NONE>
> UUID: F27D7B8F7127436981A2B5D1C93FB204
> Number of blocks: 2684349440
> Block size: 4096
> Number of clusters: 2684349440
> Cluster size: 4096
> Number of slots: 16
>
> /dev/mapper/lv0 was run with -f, check forced.
> Pass 0a: Checking cluster allocation chains
I wrote to the list about what probably was the same problem in April. You can
access the thread here:
http://oss.oracle.com/pipermail/ocfs2-users/2011-April/005093.html
Sunil wrote a few days after, explaining there was a bug in fsck that caused
it to enter an infinite loop when the filesystem was bigger than some value, and
that it had been corrected in version 1.6.4 . This is an excerpt of the
message:
> Fixed in ocfs2-tools 1.6.4. The src tarball is on oss.oracle.com.
>
> ==================================================================
>
> $ git name-rev --tags 2d741da9367b33f559802dfabe62d96f6adc7777
> 2d741da9367b33f559802dfabe62d96f6adc7777 tags/ocfs2-tools-1.6.3~33
>
> ==================================================================
> commit 2d741da9367b33f559802dfabe62d96f6adc7777
> Author: Goldwyn Rodrigues <rgoldwyn at gmail.com>
> Date: Mon Jul 26 15:19:25 2010 -0500
>
> fsck.ocfs2: Change local variable datatype to avoid infinite loop
>
> fsck on large filesystems goes in an infinite loop.
> The problem is in verify_bitmap_descs(). i, a local variable is
> declared as uint16_t and is compared with
> ocfs2_cluster_group_sizes.cgs_cluster_groups which is uint32_t.
> When cgs_cluster_groups is greater than 65535, i overflows and wraps
>
> creating an infinite loop of the following:
> for (i = 0, blkno = ost->ost_fs->fs_first_cg_blkno;
>
> i < cgs.cgs_cluster_groups;
> i++, blkno = i * ocfs2_clusters_to_blocks(ost->ost_fs,
>
> cgs.cgs_cpg)) {
So I downloaded the 1.6.4 version, compiled it by hand, and ran fsck on the 16
TB filesystem. It took a long time (maybe 4 hours), but it did finish and
corrected all the errors. Hope this helps.
Regards,
Josep Guerrero
IEEC
More information about the Ocfs2-users
mailing list