[Ocfs2-users] fsck doesn't fix "bad chain"

Sunil Mushran sunil.mushran at oracle.com
Sat Sep 17 08:25:06 PDT 2011


Can you save the o2image of the volume when it is in that state.
We'll need that for analysis.

On 09/16/2011 05:41 AM, Andre Nathan wrote:
> Hello
>
> For a while I had seen errors like this in the kernel logs:
>
>    OCFS2: ERROR (device drbd5): ocfs2_validate_gd_parent: Group
>    descriptor #69084874 has bad chain 126
>    File system is now read-only due to the potential of on-disk
>    corruption. Please run fsck.ocfs2 once the file system is unmounted.
>
> This always happened in the same device, and whenever it happened I ran
> fsck.ocfs2 -fy /dev/drbd5, which showed messages like these:
>
>    [GROUP_FREE_BITS] Group descriptor at block 201309696 claims to have
>    9893 free bits which is more than 9886 bits indicated by the bitmap.
>    Drop its free bit count down to the total? y
>    [CHAIN_BITS] Chain 166 in allocator inode 11 has 1264713 bits
>    marked free out of 1516032 total bits but the block groups in the
>    chain have 1264706 free out of 1516032 total.  Fix this by updating
>    the chain record? y
>    [CHAIN_GROUP_BITS] Allocator inode 11 has 79407510 bits marked used
>    out of 365955414 total bits but the chains have 79407911 used out of
>    365955414 total.  Fix this by updating the inode counts? y
>    [INODE_COUNT] Inode 69085510 has a link count of 0 on disk but
>    directory entry references come to 1. Update the count on disk to
>    match? y
>
> As time passed, the frequency of these issues started to increase, and
> the last time it happened, I decided to run fsck twice in a row, and was
> surprised to see it showed the same messages in both runs. It seems it
> was unable to fix the problem.
>
> I identified the files corresponding to the inodes using debugfs.ocfs2
> and copied them to a new place, and then moved the copy over the
> original file, in order to recreate the inodes. Whenever I did that for
> one inode, the error above happened and the filesystem became read-only,
> so I had to umount/mount the volume again in order to be able to write
> to it again.
>
> After doing this, I ran fsck.ocfs2 -fy again twice, and no errors were
> reported. Since then I haven't seen this problem again.
>
> I'm running kernel 2.6.35 and ocfs2-tools 1.6.4.
>
> Has anyone else seen an issue like that?
>
> Thanks
> Andre
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users




More information about the Ocfs2-users mailing list