[Ocfs-users] OCFS file system used as archived redo destination is corrupted

Sunil Mushran Sunil.Mushran at oracle.com
Fri Feb 11 13:50:45 CST 2005


Looks like the dirnode index is screwed up. The file is showing up
twice, but there is only one copy of the file.
We had detected a race which could cause this. Was fixed.

Did you start on 1.0.12 or run an older version of the module with this 
device?
May want to look into upgrading to atleast 1.0.13. We did some memory alloc
changes which were sorely required.

As part of our tests, we simulate the archiver... run a script on 
multiple nodes
which constantly creates files.

Pei Ku wrote:

>  
> we started using an ocfs file system about 4 months ago as the shared 
> archived redo  destination for the 4-node rac instances  (HP dl380, 
> msa1000, RH AS 2.1)  .  last night we are seeing some weird behavior, 
> and my guess is the inode directory in the file system is getting 
> corrupted.  I've always had a bad feeling about OCFS not being very 
> robust at handling constant file creation and deletion (which is what 
> happens when you use it for archived redo logs).
>  
> ocfs-2.4.9-e-smp-1.0.12-1 is what we are using in production.
>  
> For now, we set up an archo redo dest on a local ext3 FS on each node 
> and made that dest the mandatory dest; we changed the ocfs dest to an 
> optional one.  The reason we made ocfs arch redo dest the primary dest 
> a few months ago was because we are planning to migrate to rman-based 
> backup (as opposed to the current hot backup scheme); it's easier 
> (required?) to manage RAC archived redo logs with rman if archived 
> redos reside in a shared file system
>  
> below are some diagnostics: 
> $ ls -l rdo_1_21810.arc*
>  
> -rw-r-----    1 oracle   dba        397312 Feb 10 22:30 rdo_1_21810.arc
> -rw-r-----    1 oracle   dba        397312 Feb 10 22:30 rdo_1_21810.arc
>  
> (they have the same inode, btw -- I had done a 'ls -li' earlier but 
> the output had rolled off the screen)
>  
> after a while , one of the dba scripts gziped the file(s).  Now they 
> look like this:
>  
>  $ ls -liL /export/u10/oraarch/AUCP/rdo_1_21810.arc*
> 1457510912 -rw-r-----    1 oracle   dba            36 Feb 10 23:00 
> /export/u10/oraarch/AUCP/rdo_1_21810.arc.gz
> 1457510912 -rw-r-----    1 oracle   dba            36 Feb 10 23:00 
> /export/u10/oraarch/AUCP/rdo_1_21810.arc.gz
>  
> These two same files have the same inode also.  But the size is way 
> too small. 
>  
> yeah, /export/u10 is pretty hosed...
>  
> Pei 
>
>     -----Original Message-----
>     *From:* Pei Ku
>     *Sent:* Thu 2/10/2005 11:16 PM
>     *To:* IT
>     *Cc:* ADS
>     *Subject:* possible OCFS /export/u10/ corruption on dbprd*
>
>     Ulf,
>      
>     AUCP had problems creating archive file
>     "/export/u10/oraarch/AUCP/rdo_1_21810.arc".  After a few tries, it
>     appeared that it was able to -- except that there are *two*
>     rdo_1_21810.arc files in it (by the time you look at it, it/they
>     probably would get gzipped.  We also have a couple of zero-lengh
>     gzipped redo log files (which is not normal) in there.
>      
>     At least the problem had not brought any of the AUCP instances
>     down.  Manoj and I turned on archiving to an ext3 file system on
>     each node for now; archiving to /export/u10/ is still active but
>     made optional for now.
>      
>     My guess /export/u10/ is corrupted in some way.  I still say OCFS
>     can't take constant file creation/removing.
>      
>     We are one rev behind (1.0.12 vs 1.0.13 on ocfs.org).   No
>     guarantee that 1.0.13 contains the cure...
>      
>     Pei
>
>         -----Original Message-----
>         *From:* Oracle [mailto:oracle at dbprd01.autc.com]
>         *Sent:* Thu 2/10/2005 10:26 PM
>         *To:* DBA; Page DBA; Unix Admin
>         *Cc:*
>         *Subject:* SL1:dbprd01.autc.com:050210_222600:oalert_mon>
>         Alert Log Errors
>
>         SEVER_LVL=1  PROG=oalert_mon
>         **** oalert_mon.pl: DB=AUCP SID=AUCP1
>         [Thu Feb 10 22:25:21] ORA-19504: failed to create file
>         "/export/u10/oraarch/AUCP/rdo_1_21810.arc"
>         [Thu Feb 10 22:25:21] ORA-19504: failed to create file
>         "/export/u10/oraarch/AUCP/rdo_1_21810.arc"
>         [Thu Feb 10 22:25:21] ORA-27040: skgfrcre: create error,
>         unable to create file
>         [Thu Feb 10 22:25:28] ORA-16038: log 12 sequence# 21810 cannot
>         be archived
>         [Thu Feb 10 22:25:28] ORA-19504: failed to create file ""
>         [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1:
>         '/export/u01/oradata/AUCP/redo12m1.log'
>         [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1:
>         '/export/u01/oradata/AUCP/redo12m2.log'
>         [Thu Feb 10 22:25:28] ORA-16038: log 12 sequence# 21810 cannot
>         be archived
>         [Thu Feb 10 22:25:28] ORA-19504: failed to create file ""
>         [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1:
>         '/export/u01/oradata/AUCP/redo12m1.log'
>         [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1:
>         '/export/u01/oradata/AUCP/redo12m2.log'
>         [Thu Feb 10 22:25:28] ORA-16038: log 12 sequence# 21810 cannot
>         be archived
>         [Thu Feb 10 22:25:28] ORA-19504: failed to create file ""
>         [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1:
>         '/export/u01/oradata/AUCP/redo12m1.log'
>         [Thu Feb 10 22:25:28] ORA-00312: online log 12 thread 1:
>         '/export/u01/oradata/AUCP/redo12m2.log'
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Ocfs-users mailing list
>Ocfs-users at oss.oracle.com
>http://oss.oracle.com/mailman/listinfo/ocfs-users
>  
>


More information about the Ocfs-users mailing list