[Ocfs2-users] How long for an fsck?
Josep Guerrero
guerrero at ice.cat
Thu Apr 21 06:43:29 PDT 2011
I have a cluster with 8 nodes, all of them running Debian Lenny (plus some
additions so multipath and Infiniband works), which share an array of 48 1TB
disks. Those disks form 22 pairs of hardware RAID1, plus 4 spares). The first
21 pairs are organized in two striped LVM logical volumes, of 16 and 3 TB,
both formatted with ocfs2. The kernel is the version supplied with the
distribution (2.6.26-2-amd64).
I wanted to run an fsck on both volumes because of some errors I was getting
(probably unrelated to the filesystems, but I wanted to check). On the 3TB
volume (around 10% full) the check worked perfectly, and finished in less than
an hour (this was run with the fsck.ocfs2 provided by Lenny ocfs2-tools,
version 1.4.1):
==============
root at hidra0:/usr/local/src# fsck.ocfs2 -f /dev/hidrahome/lvol1
Checking OCFS2 filesystem in /dev/hidrahome/lvol1:
label: <NONE>
uuid: ab 76 a9 41 fa df 4c ac a3 9f 26 c5 ae 34 1a 3f
number of blocks: 959809536
bytes per block: 4096
number of clusters: 959809536
bytes per cluster: 4096
max slots: 8
/dev/hidrahome/lvol1 was run with -f, check forced.
Pass 0a: Checking cluster allocation chains
Pass 0b: Checking inode allocation chains
Pass 0c: Checking extent block allocation chains
Pass 1: Checking inodes and blocks.
Pass 2: Checking directory entries.
Pass 3: Checking directory connectivity.
Pass 4a: checking for orphaned inodes
Pass 4b: Checking inodes link counts.
All passes succeeded.
============
but the check for the second filesystem (around 40% full) did this:
============
hidra0:/usr/local/src# fsck.ocfs2 -f /dev/hidrahome/lvol0
Checking OCFS2 filesystem in /dev/hidrahome/lvol0:
label: <NONE>
uuid: 6a a9 0e aa cf 33 45 4c b4 72 3a b6 7c 3b 8d 57
number of blocks: 4168098816
bytes per block: 4096
number of clusters: 4168098816
bytes per cluster: 4096
max slots: 8
/dev/hidrahome/lvol0 was run with -f, check forced.
Pass 0a: Checking cluster allocation chains
=============
and stayed there for 8 hours (all the time keeping one core around 100% CPU
usage and with a light load on the disks; this was consistent with the same
step in the previous run, but of course it didn't take so long). I thought
that maybe I had run into some bug, so I interrupted the process, downloaded
ocfs2-tools 1.4.4 sources, compiled them, and tried with that fsck, obtaining
similar results, since it's been running for almost 7 hours like this:
=============
hidra0:/usr/local/src/ocfs2-tools-1.4.4/fsck.ocfs2# ./fsck.ocfs2 -f
/dev/hidrahome/lvol0
fsck.ocfs2 1.4.4
Checking OCFS2 filesystem in /dev/hidrahome/lvol0:
Label: <NONE>
UUID: 6AA90EAACF33454CB4723AB67C3B8D57
Number of blocks: 4168098816
Block size: 4096
Number of clusters: 4168098816
Cluster size: 4096
Number of slots: 8
/dev/hidrahome/lvol0 was run with -f, check forced.
Pass 0a: Checking cluster allocation chains
=============
and with one core CPU at 100%.
Could someone tell me if this is normal? I've been searching the web and
checking manuals for information on how long this checks should take, and
apart from one message in this list mentioning that 3 days in a 8 TB filesystem
with 300 GB was too long, I haven't been able to find anything.
If this is normal, is there any way to estimate, taking into account that the
first filesystem uses exactly the same disks and took less than an hour to
check, how long it should take for this other filesystem?
Thanks!
Josep Guerrero
More information about the Ocfs2-users
mailing list