[Ocfs2-users] Lost write in archive logs: has it ever happened?

Luis Freitas lfreitas34 at yahoo.com
Wed Dec 3 08:05:49 PST 2008


   Depending on your configuration, dataguard will transfer the modifications on the online log directly to the standby, so that the archived logs are recreated there. It doesnt transfer the archivedlogs from disk, instead it transfers the redo log entries directly to the other host, as they are generated, and the archivelogs are recreated there.

   So the remote copy is created independently from the local copy.

   The problem could be on the process writing the archivelog to disk or on the operating system/hardware. 

   If it is a operating system or hardware issue, it probably is corrupting your datafiles as well. You should find fractured blocks or other corruptions indicating lost writes as well, like discrepancies between tables and indexes and rollback errors due to incorrect block scn. You can check this running a "ANALYZE TABLE .... VALIDATE STRUCTURE CASCADE;" job on all the tables. This command will also compare all the tables with the indexes, so it will effectively read all data blocks on the database and complain if it finds corrupted blocks or if the tables and indexes have discrepancies. Some types of tables cant be verified like this, so some errors indicating this are normal. (Btw this command locks the tables while it is running).

    Can you post the error that appears when applying the archivelog?

Regards,
Luis

    

--- On Wed, 12/3/08, Silviu Marin-Caea <silviumc at fastmail.fm> wrote:
From: Silviu Marin-Caea <silviumc at fastmail.fm>
Subject: Re: [Ocfs2-users] Lost write in archive logs: has it ever happened?
To: ocfs2-users at oss.oracle.com
Date: Wednesday, December 3, 2008, 1:17 PM

On Monday 22 September 2008 15:02:36 Silviu Marin-Caea wrote:
> We have 2 nodes with OCFS2 1.2.3 (SLES9).  The archive logs are generated
> on an OCFS2 volume (mounted with nointr,datavolume).  It has happened 3
> times in one year that some archivelog had a lost write.  We have detected
> this when applying the archivelogs on the standby database (with
> dataguard).  We had to copy some datafiles from the production database to
> the standby and let it resume the recovery process.
>
> Has it ever occurred a data loss of this kind (lost write) on an OCFS2
> volume, version 1.2.3 x86_64?
>
> We had 32 bit servers before with OCFS2 that was even older than 1.2.3 and
> those servers never had such a problem with archivelogs.
>
> The storage is Dell/EMC Clariion CX3-40.  The storage on the old servers
> was CX300.
>
> We are worried that this lost writes could occur not only in archivelogs
> but in the datafiles as well...
>
> Not saying that OCFS2 is the cause, the problem might be with something
> else, but we must investigate everything.

OCFS2 is not the cause.  The error just occurred again and this time we had 
the archivelogs multiplexed on both OCFS2 and local storage (reiserfs).  Both 
archives have identical MD5 sums.

There was no lost write, just some bullshit that Oracle support tries to feed 
us.

There is still an unknown, but it's not related to OCFS2.  The unknown is
that 
archives on the standby database have different MD5 sums than the ones on 
production.  All the archives, not just the corrupt one.  Does dataguard 
intervenes in some way in the archives during transmission?  I thought it was 
just supposed to transfer them unchanged, then apply them on the standby.


_______________________________________________
Ocfs2-users mailing list
Ocfs2-users at oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users



      
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20081203/c98ddd02/attachment.html 


More information about the Ocfs2-users mailing list