[Ocfs2-users] fsck.ocfs2 loops + hangs but does not check

Wed Mar 23 15:38:38 PDT 2016

Hi ocfs2-users,

my first post to this list from yesterday probably didn't get through.

Anyway, I've made some progress in the meantime and may now ask more
specific questions ...

I'm having issues with an 11 TB ocfs2 shared filesystem on Debian Wheezy:

Linux s1a 3.2.0-4-amd64 #1 SMP Debian 3.2.54-2 x86_64 GNU/Linux

the kernel modules are:

modinfo ocfs2 -> version: 1.5.0

using stock ocfs2-tools 1.6.4-1+deb7u1 from the distri.

As an alternative I cloned and built the latest ocfs2-tools from
markfasheh's ocfs2-tools on github which should be version 1.8.4.

The filesystem runs on top of drbd, is used to roughly 40 % and suffers
from read-only remounts and hanging clients since the last reboot. This
may be DLM problems but I suspect they stem from some corrupt disk
structures. Before that it all ran stable for months.

This situation made me want to run fsck.ocfs2 and now I wonder how to do
that. The filesystem is not mounted.

With the stock ocfs-tools 1.6.4:

root at s1a:~# fsck.ocfs2 -v -f /dev/drbd1 > fsck_drbd1.log 2>&1
fsck.ocfs2 1.6.4
Checking OCFS2 filesystem in /dev/drbd1:
  Label:              ocfs2_ASSET
  UUID:               6A1A0189A3F94E32B6B9A526DF9060F3
  Number of blocks:   5557283182
  Block size:         2048
  Number of clusters: 2778641591
  Cluster size:       4096
  Number of slots:    16

I'm checking fsck_drbd1.log and find that it is making progress in

Pass 0a: Checking cluster allocation chains

until it reaches "chain 73" and goes into an infinite loop filling the
logfile with breathtaking speed.

With the newly built ocfs-tools 1.8.4 I get:

root at s1a:~# fsck.ocfs2 -v -f /dev/drbd1 > fsck_drbd1.log 2>&1
fsck.ocfs2 1.8.4
Checking OCFS2 filesystem in /dev/drbd1:
  Label:              ocfs2_ASSET
  UUID:               6A1A0189A3F94E32B6B9A526DF9060F3
  Number of blocks:   5557283182
  Block size:         2048
  Number of clusters: 2778641591
  Cluster size:       4096
  Number of slots:    16

Again watching the verbose output in fsck_drbd1.log I find that this
time it proceeds up to

Pass 0a: Checking cluster allocation chains
o2fsck_pass0:1360 | found inode alloc 13 at block 13

and stays there without any further progress. I've terminated this
process after waiting for more than an hour.

Now - I'm lost somehow ... and would very much appreciate if anybody on
this list would share his knowledge and give me a hint what to do next.

What could be done to get this file system checked and repaired? Am I
missing something important or do I just have to wait a little bit
longer? Is there a version of ocfs2-tools / fsck.ocfs2 which will
perform as expected?

I'm prepared to upgrade the kernel to 3.16.0-0.bpo.4-amd64 but shy away
from taking that risk without any clue of whether that might solve my
problem ...

Thanks in advance ... Michael Ulbrich