[Ocfs-users] ORA-01207 after SAN maintenance

Fri Nov 26 02:41:20 CST 2004

it could be that you have a large blocksize and that you took the
snapshot or the change while we were in the middle of  a write.

with ext3 you do pagesize io, with ocfs (and raw) youwould possibly be
doing 512 byte multiples.

the core of it is that ocfs does exactly the same as raw, no caching
whatsoever, ever, and blocks go to disk as they are aligned to 512 or
4kb pages.

On Thu, Nov 25, 2004 at 09:19:29PM -0500, Matt Daniels wrote:
> Hi Jeremy,
> 
> Thanks for the response.  I believe GSD was still running when the san maintenance was done.  The database shutdown cleanly, all the issues arose when we tried to start it back up.  We've since found out that a development instance with datafiles on another ocfs partition on the san suffered the exact problem as production, while a development instance with its datafiles on an ext3 partition on the san had no issues at all, and came up cleanly.
> 
> This one still has us stumped, we're working with support to try and determine root cause...any other thoughts or suggestions are welcome...
> 
> -Matt
> 
> -----Original Message-----
> From: ocfs-users-bounces at oss.oracle.com
> [mailto:ocfs-users-bounces at oss.oracle.com]On Behalf Of Jeremy Schneider
> Sent: Wednesday, November 24, 2004 10:33 AM
> To: ocfs-users at oss.oracle.com; Matt Daniels
> Subject: Re: [Ocfs-users] ORA-01207 after SAN maintenance
> 
> 
> wow, i'm surprised that cluster manager (oracm) stayed up.  i don't know
> all the technical internals of exactly how it works, but i know that it
> uses a quorum on shared storage...  i guess it might only use the quorum
> for split-brain situations (where the interconnect goes down) but
> personally i'd still never yank the shared disk quorum out from under it
> without shutting it down!
> 
> but i also have to admit that your error message doesn't sound like it
> would be related to this.  did you shut down GSD or did you leave that
> running too?  the first error (control file older than datafiles)
> doesn't make much sense at all...  i think that just means the SCN in
> the datafile headers was newer than the SCN recorded in the control
> file?  did the DB shutdown cleanly according to the alert log?
> 
> (FYI, we're running 9.2.0.5 on a 2-node RHEL3 cluster using ocfs  --
> it's a backend for 11.5.9  --  and we've been production for almost 3
> months without any problems so far...  oh - and we have [separate] ocfs
> partitions for archive logs too)
> 
> jeremy, dba
> 
> 
> >>> "Matt Daniels" <Matt.Daniels at priorityhealthcare.com> 11/24/2004
> 9:30:10 AM >>>
> We had a situation over the weekend with our production database that
> we can't figure out, hoping someone can shed some light.
> 
> Specifics:
> Oracle 9.2.0.4
> OS is Redhat AS2.1
> ocfs-2.4.9-e-summit-1.0.12-1
> ocfs-tools-1.0.10-1
> ocfs-support-1.0.10-1
> ocfs-2.4.9-e-enterprise-1.0.12-1
> 
> All database, redo, undo, and control files are on ocfs, archived logs
> are on ext3.
> 
> We shut down the database for san maintenance, but didn't shut down
> cluster
> manager.  The san was disconnected from the server, a tray was added
> and then
> the san was reconnected.  The server and cluster manager remained up
> during the
> maintenance.
> 
> When we tried to restart the database, we got an ORA-01207, saying the
> control
> file was older than the datafiles.  Per Oracle support, we recreated
> the control file
> and attempted to bring the db up with the new one.  At this point we
> received the
> following:
> 
> Errors in file
> /opt/oracle/product/9.2.0/admin/ENTPRD/udump/entprd2_ora_22596.trc:
> ORA-00600: internal error code, arguments: [kcoapl_blkchk], [5], [393],
> [6101], [], [], [], []
> 
> There's a RAC bug entry for [kcoapl_blkchk], but it was for a 4-node
> RAC, ours is only
> 2 nodes, so Oracle internals support said they didn't think it applied
> to our case.  We
> ended up doing a point-in-time recovery to before the san maintenance,
> but moved the
> datafiles to an ext3 partition for now.
> 
> Has anyone seen this before, or have any input as to what happened? 
> We're trying to
> determine if this is a bug, and if we should move back to RAC/ocfs.
> 
> Thanks very much,
> Matt Daniels
> Apps DBA, Priority Healthcare Corp
> 
> This message (including any attachments) contains confidential information intended for a specific individual(s) and purpose, and is protected by law.  If you are not the intended recipient, you should delete this message.  Any disclosure, copying, or distribution of this message, or the taking of any action based on it, by anyone other than the intended recipient(s), is strictly prohibited.
> 
> <<<<...>>>>
> _______________________________________________
> Ocfs-users mailing list
> Ocfs-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs-users
> 
> 
> _______________________________________________
> Ocfs-users mailing list
> Ocfs-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs-users