[Ocfs-users] OCFS file system used as archived redo destination
is corrupted
Pei Ku
pku at autotradecenter.com
Fri Feb 11 14:45:52 CST 2005
This file system was created under 1.0.12.
Does upgrade from 1.0.12 to 1.0.13 require reformatting the file systems? I don't care about the file system we are using for archived redos (it's pretty screwed up anyway - it's gonna need a clean swipe). But should I do a full FS pre-upgrade dump and post-upgrade restore for the file system used for storing oracle datafiles? Of course I'll do a full db backup in any case.
Are you saying the problem I described was a known problem in 1.0.12 and had been fixed in 1.0.13?
Before this problem, our production db had archive_lag_target set to 15 minutes (in order for the standby db not to lag behind the production db too much). Since this is a four-node RAC, it means that there are at least (60/15)*4=16 archived redos generated per hour (16*24=384 per day). And the fact that this problem only appears after several months tells me that OCFS QA process needs to be more thorough and run for a long time in order to catch bugs like this.
Another weird thing is when I do a 'ls /export/u10/oraarch/AUCP', it takes about 15 sec and CPU usage is like 25+% for that duration. If I do the same command on multiple nodes, the elapse time might be 30 sec on each node. It concerns me a simple command like 'ls' can be that resource intensive and slow. Maybe it's related to the FS corruption...
thanks
Pei
> -----Original Message-----
> From: Sunil Mushran [mailto:Sunil.Mushran at oracle.com]
> Sent: Friday, February 11, 2005 11:51 AM
> To: Pei Ku
> Cc: ocfs-users at oss.oracle.com
> Subject: Re: [Ocfs-users] OCFS file system used as archived redo
> destination is corrupted
>
>
> Looks like the dirnode index is screwed up. The file is showing up
> twice, but there is only one copy of the file.
> We had detected a race which could cause this. Was fixed.
>
> Did you start on 1.0.12 or run an older version of the module
> with this
> device?
> May want to look into upgrading to atleast 1.0.13. We did
> some memory alloc
> changes which were sorely required.
>
> As part of our tests, we simulate the archiver... run a script on
> multiple nodes
> which constantly creates files.
>
> Pei Ku wrote:
>
> >
> > we started using an ocfs file system about 4 months ago as
> the shared
> > archived redo destination for the 4-node rac instances (HP dl380,
> > msa1000, RH AS 2.1) . last night we are seeing some weird
> behavior,
> > and my guess is the inode directory in the file system is getting
> > corrupted. I've always had a bad feeling about OCFS not being very
> > robust at handling constant file creation and deletion
> (which is what
> > happens when you use it for archived redo logs).
> >
> > ocfs-2.4.9-e-smp-1.0.12-1 is what we are using in production.
> >
> > For now, we set up an archo redo dest on a local ext3 FS on
> each node
> > and made that dest the mandatory dest; we changed the ocfs
> dest to an
> > optional one. The reason we made ocfs arch redo dest the
> primary dest
> > a few months ago was because we are planning to migrate to
> rman-based
> > backup (as opposed to the current hot backup scheme); it's easier
> > (required?) to manage RAC archived redo logs with rman if archived
> > redos reside in a shared file system
> >
> > below are some diagnostics:
> > $ ls -l rdo_1_21810.arc*
> >
> > -rw-r----- 1 oracle dba 397312 Feb 10 22:30
> rdo_1_21810.arc
> > -rw-r----- 1 oracle dba 397312 Feb 10 22:30
> rdo_1_21810.arc
> >
> > (they have the same inode, btw -- I had done a 'ls -li' earlier but
> > the output had rolled off the screen)
> >
> > after a while , one of the dba scripts gziped the file(s).
> Now they
> > look like this:
> >
> > $ ls -liL /export/u10/oraarch/AUCP/rdo_1_21810.arc*
> > 1457510912 -rw-r----- 1 oracle dba 36 Feb 10 23:00
> > /export/u10/oraarch/AUCP/rdo_1_21810.arc.gz
> > 1457510912 -rw-r----- 1 oracle dba 36 Feb 10 23:00
> > /export/u10/oraarch/AUCP/rdo_1_21810.arc.gz
> >
> > These two same files have the same inode also. But the size is way
> > too small.
> >
> > yeah, /export/u10 is pretty hosed...
> >
> > Pei
> >
> > -----Original Message-----
> > *From:* Pei Ku
> > *Sent:* Thu 2/10/2005 11:16 PM
> > *To:* IT
> > *Cc:* ADS
> > *Subject:* possible OCFS /export/u10/ corruption on dbprd*
> >
> > Ulf,
> >
> > AUCP had problems creating archive file
> > "/export/u10/oraarch/AUCP/rdo_1_21810.arc". After a
> few tries, it
> > appeared that it was able to -- except that there are *two*
> > rdo_1_21810.arc files in it (by the time you look at it, it/they
> > probably would get gzipped. We also have a couple of zero-lengh
> > gzipped redo log files (which is not normal) in there.
> >
> > At least the problem had not brought any of the AUCP instances
> > down. Manoj and I turned on archiving to an ext3 file system on
> > each node for now; archiving to /export/u10/ is still active but
> > made optional for now.
> >
> > My guess /export/u10/ is corrupted in some way. I
> still say OCFS
> > can't take constant file creation/removing.
> >
> > We are one rev behind (1.0.12 vs 1.0.13 on ocfs.org). No
> > guarantee that 1.0.13 contains the cure...
> >
> > Pei
> >
> > -----Original Message-----
> > *From:* Oracle [mailto:oracle at dbprd01.autc.com]
> > *Sent:* Thu 2/10/2005 10:26 PM
> > *To:* DBA; Page DBA; Unix Admin
> > *Cc:*
> > *Subject:* SL1:dbprd01.autc.com:050210_222600:oalert_mon>
> > Alert Log Errors
> >
> > SEVER_LVL=1 PROG=oalert_mon
> > **** oalert_mon.pl: DB=AUCP SID=AUCP1
> > [Thu Feb 10 22:25:21] ORA-19504: failed to create file
> > "/export/u10/oraarch/AUCP/rdo_1_21810.arc"
> > [Thu Feb 10 22:25:21] ORA-19504: failed to create file
> > "/export/u10/oraarch/AUCP/rdo_1_21810.arc"
> > [Thu Feb 10 22:25:21] ORA-27040: skgfrcre: create error,
> > unable to create file
> > [Thu Feb 10 22:25:28] ORA-16038: log 12 sequence#
> 21810 cannot
> > be archived
> > [Thu Feb 10 22:25:28] ORA-19504: failed to create file ""
> > [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1:
> > '/export/u01/oradata/AUCP/redo12m1.log'
> > [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1:
> > '/export/u01/oradata/AUCP/redo12m2.log'
> > [Thu Feb 10 22:25:28] ORA-16038: log 12 sequence#
> 21810 cannot
> > be archived
> > [Thu Feb 10 22:25:28] ORA-19504: failed to create file ""
> > [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1:
> > '/export/u01/oradata/AUCP/redo12m1.log'
> > [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1:
> > '/export/u01/oradata/AUCP/redo12m2.log'
> > [Thu Feb 10 22:25:28] ORA-16038: log 12 sequence#
> 21810 cannot
> > be archived
> > [Thu Feb 10 22:25:28] ORA-19504: failed to create file ""
> > [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1:
> > '/export/u01/oradata/AUCP/redo12m1.log'
> > [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1:
> > '/export/u01/oradata/AUCP/redo12m2.log'
> >
> >-------------------------------------------------------------
> -----------
> >
> >_______________________________________________
> >Ocfs-users mailing list
> >Ocfs-users at oss.oracle.com
> >http://oss.oracle.com/mailman/listinfo/ocfs-users
> >
> >
>
More information about the Ocfs-users
mailing list