[Ocfs2-users] How long for an fsck?

Thu Apr 21 06:43:29 PDT 2011

I have a cluster with 8 nodes, all of them running Debian Lenny (plus some 
additions so multipath and Infiniband works), which share an array of 48 1TB 
disks. Those disks form 22 pairs of hardware RAID1, plus 4 spares). The first 
21 pairs are organized in two striped LVM logical volumes, of 16 and 3 TB, 
both formatted with ocfs2. The kernel is the version supplied with the 
distribution (2.6.26-2-amd64).

I wanted to run an fsck on both volumes because of some errors I was getting 
(probably unrelated to the filesystems, but I wanted to check). On the 3TB 
volume (around 10% full) the check worked perfectly, and finished in less than 
an hour (this was run with the fsck.ocfs2 provided by Lenny ocfs2-tools, 
version 1.4.1):

==============
root at hidra0:/usr/local/src# fsck.ocfs2 -f /dev/hidrahome/lvol1
Checking OCFS2 filesystem in /dev/hidrahome/lvol1:
  label:              <NONE>
  uuid:               ab 76 a9 41 fa df 4c ac a3 9f 26 c5 ae 34 1a 3f 
  number of blocks:   959809536
  bytes per block:    4096
  number of clusters: 959809536
  bytes per cluster:  4096
  max slots:          8

/dev/hidrahome/lvol1 was run with -f, check forced.
Pass 0a: Checking cluster allocation chains
Pass 0b: Checking inode allocation chains
Pass 0c: Checking extent block allocation chains
Pass 1: Checking inodes and blocks.
Pass 2: Checking directory entries.
Pass 3: Checking directory connectivity.
Pass 4a: checking for orphaned inodes
Pass 4b: Checking inodes link counts.
All passes succeeded.
============

but the check for the second filesystem (around 40% full) did this:

============
hidra0:/usr/local/src# fsck.ocfs2 -f /dev/hidrahome/lvol0
Checking OCFS2 filesystem in /dev/hidrahome/lvol0:
  label:              <NONE>
  uuid:               6a a9 0e aa cf 33 45 4c b4 72 3a b6 7c 3b 8d 57
  number of blocks:   4168098816
  bytes per block:    4096
  number of clusters: 4168098816
  bytes per cluster:  4096
  max slots:          8

/dev/hidrahome/lvol0 was run with -f, check forced.
Pass 0a: Checking cluster allocation chains
=============

and stayed there for 8 hours (all the time keeping one core around 100% CPU 
usage and with a light load on the disks; this was consistent with the same 
step in the previous run, but of course it didn't take so long). I thought 
that maybe I had run into some bug, so I interrupted the process, downloaded 
ocfs2-tools 1.4.4 sources, compiled them, and tried with that fsck, obtaining 
similar results, since it's been running for almost 7 hours like this:

=============
hidra0:/usr/local/src/ocfs2-tools-1.4.4/fsck.ocfs2# ./fsck.ocfs2 -f 
/dev/hidrahome/lvol0
fsck.ocfs2 1.4.4
Checking OCFS2 filesystem in /dev/hidrahome/lvol0:
  Label:              <NONE>
  UUID:               6AA90EAACF33454CB4723AB67C3B8D57
  Number of blocks:   4168098816
  Block size:         4096
  Number of clusters: 4168098816
  Cluster size:       4096
  Number of slots:    8

/dev/hidrahome/lvol0 was run with -f, check forced.
Pass 0a: Checking cluster allocation chains

=============

and with one core CPU at 100%. 

Could someone tell me if this is normal? I've been searching the web and 
checking manuals for information on how long this checks should take, and 
apart from one message in this list mentioning that 3 days in a 8 TB filesystem 
with 300 GB was too long, I haven't been able to find anything. 

If this is normal, is there any way to estimate, taking into account that the 
first filesystem uses exactly the same disks and took less than an hour to 
check, how long it should take for this other filesystem?

Thanks!

Josep Guerrero