[Ocfs2-users] [Fwd: Re: Unable to fix corrupt directories with fsck.ocfs2]
Robin Garner
robin.garner at scu.edu.au
Tue May 19 19:05:59 PDT 2009
Joel Becker wrote:
> On Tue, May 19, 2009 at 02:49:31PM +1000, Robin Garner wrote:
>> Robin Garner wrote:
>>> Yes. This is a 24/7 application (at least during semester), and
>>> arranging extended downtime is a challenge.
>
> Ok, you ran fsck against a live filesystem and skipped the
> cluster locking with the '-F' option. So now you have two problems.
>
> 1) The original directory problem.
> 2) The duplicate blocks created by your fsck of a mounted filesystem.
>
> Do you have backups?
>
> Joel
>
OK, now I'm confused:
The man page for fsck.ocfs2 says
-F Usually fsck.ocfs2 will check with cluster
services and the DLM to make sure that no
one else in the cluster is actively using
the device before proceeding. -F skips
this check and should only be used when it
can be guaranteed that there can be no
other users of the device while fsck.ocfs2
is running.
To me & my colleagues "no one else in the cluster is actively using the
device" means that the filesystem must be mounted on *at most* one
node in the cluster (the node doing the fsck). That's what we did.
This filesystem is normally mounted by both nodes of a 2-node cluster.
We had cleanly unmounted the filesystem on the other node. fsck.ocfs2
without '-F' gave errors, but then mounted.ocfs2 claimed the disk was
mounted on both nodes. Eventually we shut down the other node, and
mounted.ocfs2 still thought it had it mounted. At this point we used '-F'.
I can't see any reference in the man page about not doing an fsck on a
mounted disk.
e2fsck for example says this:
> WARNING!!! Running e2fsck on a mounted file system may cause
> SEVERE filesystem damage.
>
> Do you really want to continue (y/n)?
when you try to fsck a mounted filesystem. May I suggest that
fsck.ocfs2 do something similar ? Perhaps 'everyone knows' you can't
run fsck on a mounted filesystem, but we were assuming that ocfs2 being
a modern cluster filesystem might be a little more advanced. Apparently
not.
We'll try to salvage the data another way (we believe the directory
corruption is some way down the directory tree), and pull missing data
back from backups.
Robin
More information about the Ocfs2-users
mailing list