[Ocfs2-users] [Fwd: Re: Unable to fix corrupt directories with fsck.ocfs2]

Luis Freitas lfreitas34 at yahoo.com
Wed May 20 10:46:16 PDT 2009


Robin,

   To me, anyone else includes the kernel of the current node.

   Well, if it is unclear the man page should be revised. Also a big warning message on ocfs2.fsck would be nice, after all we all make mistakes. But this is only my two cents.

   Running fsck on any journaled filesystem will replay the journal. This will cause corruption if the filesystem is mounted read/write, even if the filesystem was not corrupted on the first place. 

   You could mount it read only, but you risk getting a kernel panic when the filesystem suddenly changes if fsck corrects something. I am not aware of any filesystem that can withstand a online fsck. Sun ZFS can do online correction, but it doesnt have a fsck tool.

Regards,
Luis   

--- On Tue, 5/19/09, Robin Garner <robin.garner at scu.edu.au> wrote:

> From: Robin Garner <robin.garner at scu.edu.au>
> Subject: Re: [Ocfs2-users] [Fwd: Re: Unable to fix corrupt directories with fsck.ocfs2]
> To: ocfs2-users at oss.oracle.com
> Date: Tuesday, May 19, 2009, 11:05 PM
> Joel Becker wrote:
> > On Tue, May 19, 2009 at 02:49:31PM +1000, Robin Garner
> wrote:
> >> Robin Garner wrote:
> >>> Yes.  This is a 24/7 application (at
> least during semester), and 
> >>> arranging extended downtime is a challenge.
> > 
> >     Ok, you ran fsck against a live
> filesystem and skipped the
> > cluster locking with the '-F' option.  So now you
> have two problems.
> > 
> > 1) The original directory problem.
> > 2) The duplicate blocks created by your fsck of a
> mounted filesystem.
> > 
> >     Do you have backups?
> > 
> > Joel
> > 
> 
> OK, now I'm confused:
> 
> The man page for fsck.ocfs2 says
> 
>         -F 
>    Usually fsck.ocfs2 will check with
> cluster
>            
>    services  and the DLM to make sure
> that no
>            
>    one else in the cluster is actively 
> using
>            
>    the  device  before 
> proceeding.  -F skips
>            
>    this check and should only be used when
> it
>            
>    can  be  guaranteed 
> that  there can be no
>            
>    other users of the device while
> fsck.ocfs2
>            
>    is running.
> 
> To me & my colleagues "no one else in the cluster is
> actively using the 
>   device" means that the filesystem must be mounted on
> *at most* one 
> node in the cluster (the node doing the fsck).  That's
> what we did.
> 
> This filesystem is normally mounted by both nodes of a
> 2-node cluster. 
> We had cleanly unmounted the filesystem on the other
> node.  fsck.ocfs2 
> without '-F' gave errors, but then mounted.ocfs2 claimed
> the disk was 
> mounted on both nodes.  Eventually we shut down the
> other node, and 
> mounted.ocfs2 still thought it had it mounted.  At
> this point we used '-F'.
> 
> I can't see any reference in the man page about not doing
> an fsck on a 
> mounted disk.
> 
> e2fsck for example says this:
> 
>  > WARNING!!! Running e2fsck on a mounted file system
> may cause
>  > SEVERE filesystem damage.
>  >
>  > Do you really want to continue (y/n)?
> 
> when you try to fsck a mounted filesystem.  May I
> suggest that 
> fsck.ocfs2 do something similar ?  Perhaps 'everyone
> knows' you can't 
> run fsck on a mounted filesystem, but we were assuming that
> ocfs2 being 
> a modern cluster filesystem might be a little more
> advanced.  Apparently 
> not.
> 
> We'll try to salvage the data another way (we believe the
> directory 
> corruption is some way down the directory tree), and pull
> missing data 
> back from backups.
> 
> Robin
> 
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
> 


      



More information about the Ocfs2-users mailing list