[Ocfs2-users] fsck doesn't fix "bad chain"

Andre Nathan andre at digirati.com.br
Fri Sep 16 05:41:29 PDT 2011


Hello

For a while I had seen errors like this in the kernel logs:

  OCFS2: ERROR (device drbd5): ocfs2_validate_gd_parent: Group 
  descriptor #69084874 has bad chain 126
  File system is now read-only due to the potential of on-disk 
  corruption. Please run fsck.ocfs2 once the file system is unmounted.

This always happened in the same device, and whenever it happened I ran
fsck.ocfs2 -fy /dev/drbd5, which showed messages like these:

  [GROUP_FREE_BITS] Group descriptor at block 201309696 claims to have 
  9893 free bits which is more than 9886 bits indicated by the bitmap. 
  Drop its free bit count down to the total? y
  [CHAIN_BITS] Chain 166 in allocator inode 11 has 1264713 bits 
  marked free out of 1516032 total bits but the block groups in the 
  chain have 1264706 free out of 1516032 total.  Fix this by updating 
  the chain record? y
  [CHAIN_GROUP_BITS] Allocator inode 11 has 79407510 bits marked used 
  out of 365955414 total bits but the chains have 79407911 used out of 
  365955414 total.  Fix this by updating the inode counts? y
  [INODE_COUNT] Inode 69085510 has a link count of 0 on disk but 
  directory entry references come to 1. Update the count on disk to 
  match? y

As time passed, the frequency of these issues started to increase, and
the last time it happened, I decided to run fsck twice in a row, and was
surprised to see it showed the same messages in both runs. It seems it
was unable to fix the problem.

I identified the files corresponding to the inodes using debugfs.ocfs2
and copied them to a new place, and then moved the copy over the
original file, in order to recreate the inodes. Whenever I did that for
one inode, the error above happened and the filesystem became read-only,
so I had to umount/mount the volume again in order to be able to write
to it again.

After doing this, I ran fsck.ocfs2 -fy again twice, and no errors were
reported. Since then I haven't seen this problem again.

I'm running kernel 2.6.35 and ocfs2-tools 1.6.4.

Has anyone else seen an issue like that?

Thanks
Andre




More information about the Ocfs2-users mailing list