[Ocfs-users] OCFS error

Phany Leblanc pleblanc at techlinkentertainment.com
Fri Jun 8 08:20:15 PDT 2007


Hi,

Before I describe the problem, I'll describe my environment.  This is a 
development environment built over 1.5 yrs ago:

 * We have a 2 node Oracle RAC Cluster.
 * built on two Dell PowerEdge 1850 Servers (Intel(R) Xeon(TM) CPU 3.20GHz)
 * OS is RHEL3 U3
 * Kernel 2.4.21-20.ELsmp
 * these 2 servers are attached to an EMC AX100 SAN via fibre channel
 * HBA (QLogic QLA6312 PCI to Fibre Channel Host Adapter)
 * Oracle Version: Oracle Database 10g Enterprise Edition Release 10.1.0.4.0
 * OCFS:  ocfs-tools-1.0.10-1 .... ocfs-2.4.21-EL-smp-1.0.13-1 ... 
ocfs-support-1.1.5-1
 * We have two ocfs volumes mounted /u01 and /u02
 * These servers were last rebooted 17 days ago


Okay, hope that is sufficient info for starters... now here is the 
problem we experienced:

I arrived at work this morning and discovered some errors in node2's 
Oracle alert log, the error occurred in the middle of the night, the 
archiver was complaining it couldn't archive a log to the flash recovery 
area on /u02:

...(snip)...
Jun  8 01:10:49 2007
Private_strands 0 at log switch
Thread 2 advanced to log sequence 30694
  Current log# 3 seq# 30694 mem# 0: /u01/oradata/rgcsd/redo03a.log
  Current log# 3 seq# 30694 mem# 1: /u01/oradata/rgcsd/redo03b.log
Fri Jun  8 01:10:49 2007
ARC0: Evaluating archive thread 2 sequence 30693
Fri Jun  8 01:10:50 2007
Errors in file /usr/app/oracle/admin/rgcsd/bdump/rgcsd2_arc0_2036.trc:
ORA-01264: Unable to create archived log file name
ORA-19800: Unable to initialize Oracle Managed Destination
Linux Error: 13: Permission denied
Fri Jun  8 01:10:50 2007
Errors in file /usr/app/oracle/admin/rgcsd/bdump/rgcsd2_arc0_2036.trc:
ORA-16032: parameter LOG_ARCHIVE_DEST_10 destination string cannot be 
translated
ORA-01264: Unable to create archived log file name
ORA-19800: Unable to initialize Oracle Managed Destination
Linux Error: 13: Permission denied
ARC0: Archiving not possible: No primary destinations
ARC0: Failed to archive thread 2 sequence 30693 (16032)
ARCH: Archival stopped, error occurred. Will continue retrying
...(snip)...


The archiver continued retrying until it succeeded 6 minutes later (at 
01:16:38) and afterward everything continued to function seemingly smoothly.

I checked node2's /var/log/messages logfile and I found the following 
errors occurred at the same time the archiver had its problems:

...(snip)...
Jun  8 01:10:50 rac4 kernel: (2036) ERROR: status = -17, 
Common/ocfsgendirnode.c, 1535
Jun  8 01:10:50 rac4 kernel: (2036) ERROR: status = -17, 
Common/ocfsgencreate.c, 1625
Jun  8 01:10:50 rac4 kernel: (2036) ERROR: status = -17, 
Common/ocfsgencreate.c, 1821
Jun  8 01:10:50 rac4 kernel: (2036) ERROR: status = -17, 
Linux/ocfsmain.c, 2193
Jun  8 01:10:50 rac4 kernel: (2036) ERROR: status = -17, 
Linux/ocfsmain.c, 2493
...(snip)...


(I checked node1's /var/log/messages logfile and these ocfs errors do 
NOT appear on that node)


I checked the event log for the EMC AX100 SAN but there is nothing reported.

Has this happened to anyone else?  Anyone have any ideas what the root 
cause of this hiccup could be?  This is the first time this has happened 
to us.

Thanks for your input,
Phany




More information about the Ocfs-users mailing list