[Ocfs2-users] Lost write in archive logs: has it ever happened?

Wed Dec 3 07:17:24 PST 2008

On Monday 22 September 2008 15:02:36 Silviu Marin-Caea wrote:
> We have 2 nodes with OCFS2 1.2.3 (SLES9).  The archive logs are generated
> on an OCFS2 volume (mounted with nointr,datavolume).  It has happened 3
> times in one year that some archivelog had a lost write.  We have detected
> this when applying the archivelogs on the standby database (with
> dataguard).  We had to copy some datafiles from the production database to
> the standby and let it resume the recovery process.
>
> Has it ever occurred a data loss of this kind (lost write) on an OCFS2
> volume, version 1.2.3 x86_64?
>
> We had 32 bit servers before with OCFS2 that was even older than 1.2.3 and
> those servers never had such a problem with archivelogs.
>
> The storage is Dell/EMC Clariion CX3-40.  The storage on the old servers
> was CX300.
>
> We are worried that this lost writes could occur not only in archivelogs
> but in the datafiles as well...
>
> Not saying that OCFS2 is the cause, the problem might be with something
> else, but we must investigate everything.

OCFS2 is not the cause.  The error just occurred again and this time we had 
the archivelogs multiplexed on both OCFS2 and local storage (reiserfs).  Both 
archives have identical MD5 sums.

There was no lost write, just some bullshit that Oracle support tries to feed 
us.

There is still an unknown, but it's not related to OCFS2.  The unknown is that 
archives on the standby database have different MD5 sums than the ones on 
production.  All the archives, not just the corrupt one.  Does dataguard 
intervenes in some way in the archives during transmission?  I thought it was 
just supposed to transfer them unchanged, then apply them on the standby.