[Ocfs2-users] [Fwd: Re: Unable to fix corrupt directories with fsck.ocfs2]

Robin Garner robin.garner at scu.edu.au
Tue May 19 19:05:59 PDT 2009


Joel Becker wrote:
> On Tue, May 19, 2009 at 02:49:31PM +1000, Robin Garner wrote:
>> Robin Garner wrote:
>>> Yes.  This is a 24/7 application (at least during semester), and 
>>> arranging extended downtime is a challenge.
> 
> 	Ok, you ran fsck against a live filesystem and skipped the
> cluster locking with the '-F' option.  So now you have two problems.
> 
> 1) The original directory problem.
> 2) The duplicate blocks created by your fsck of a mounted filesystem.
> 
> 	Do you have backups?
> 
> Joel
> 

OK, now I'm confused:

The man page for fsck.ocfs2 says

        -F     Usually fsck.ocfs2 will check with cluster
               services  and the DLM to make sure that no
               one else in the cluster is actively  using
               the  device  before  proceeding.  -F skips
               this check and should only be used when it
               can  be  guaranteed  that  there can be no
               other users of the device while fsck.ocfs2
               is running.

To me & my colleagues "no one else in the cluster is actively using the 
  device" means that the filesystem must be mounted on *at most* one 
node in the cluster (the node doing the fsck).  That's what we did.

This filesystem is normally mounted by both nodes of a 2-node cluster. 
We had cleanly unmounted the filesystem on the other node.  fsck.ocfs2 
without '-F' gave errors, but then mounted.ocfs2 claimed the disk was 
mounted on both nodes.  Eventually we shut down the other node, and 
mounted.ocfs2 still thought it had it mounted.  At this point we used '-F'.

I can't see any reference in the man page about not doing an fsck on a 
mounted disk.

e2fsck for example says this:

 > WARNING!!! Running e2fsck on a mounted file system may cause
 > SEVERE filesystem damage.
 >
 > Do you really want to continue (y/n)?

when you try to fsck a mounted filesystem.  May I suggest that 
fsck.ocfs2 do something similar ?  Perhaps 'everyone knows' you can't 
run fsck on a mounted filesystem, but we were assuming that ocfs2 being 
a modern cluster filesystem might be a little more advanced.  Apparently 
not.

We'll try to salvage the data another way (we believe the directory 
corruption is some way down the directory tree), and pull missing data 
back from backups.

Robin



More information about the Ocfs2-users mailing list