[Ocfs2-users] Lost write in archive logs: has it ever happened?

Mon Sep 22 08:54:46 PDT 2008

Silviu,

   When I had this kind of issues it usually was caused by a bad hba, or a power failure. I am assuming it is not the latter as you would be aware of it.

   It is a difficult situation, since the controller only malfunctions sporadically it is difficult to prove that it is the cause or to get it changed on warranty. And your database slowly gets corrupted, until someday it crashes and wont startup. If this is the cause it surely is happening on the datafiles also.

   To be safe you should run a "ANALYZE TABLE ... VALIDATE STRUCTURE CASCADE;" on all your database tables, and look for fractured or bad blocks on the datafiles using dbv or rman. A fractured block is one that has a different timestamp on the begin and the end, so it was only partially writen to the disk.

   You also could try to change the hba with some other server to see if the problem disappears.

Regards,
Luis

--- On Mon, 9/22/08, Silviu Marin-Caea <silviumc at fastmail.fm> wrote:

> From: Silviu Marin-Caea <silviumc at fastmail.fm>
> Subject: [Ocfs2-users] Lost write in archive logs: has it ever happened?
> To: ocfs2-users at oss.oracle.com
> Date: Monday, September 22, 2008, 9:02 AM
> We have 2 nodes with OCFS2 1.2.3 (SLES9).  The archive logs
> are generated on 
> an OCFS2 volume (mounted with nointr,datavolume).  It has
> happened 3 times in 
> one year that some archivelog had a lost write.  We have
> detected this when 
> applying the archivelogs on the standby database (with
> dataguard).  We had to 
> copy some datafiles from the production database to the
> standby and let it 
> resume the recovery process.
> 
> Has it ever occurred a data loss of this kind (lost write)
> on an OCFS2 volume, 
> version 1.2.3 x86_64?
> 
> We had 32 bit servers before with OCFS2 that was even older
> than 1.2.3 and 
> those servers never had such a problem with archivelogs.
> 
> The storage is Dell/EMC Clariion CX3-40.  The storage on
> the old servers was 
> CX300.
> 
> We are worried that this lost writes could occur not only
> in archivelogs but 
> in the datafiles as well...
> 
> Not saying that OCFS2 is the cause, the problem might be
> with something else, 
> but we must investigate everything.
> 
> Thank you
> 
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users