[Ocfs-users] Server crashed with Common/ocfsgencreate.c, Common/ocfsgenvote.c

Aubertin Michael maubertin at ares.fr
Tue Dec 20 03:45:10 CST 2005


Hi,

I'm not an oracle expert, but it seems that one of your OCFS volume was
damaged by Fiber crash. In order to know which file is corrupt, i
suggest you to add this line:

LOG_ERROR_ARGS ("File=%s", fe->filename);

between:
LOG_ERROR_STR ("oin has no matching inode!!!!");
and:
OCFS_SET_FLAG (oin->oin_flags, OCFS_OIN_INVALID);

in function: ocfs_verify_update_oin
in file : ocfs2/Common/ocfsgencreate.c

of your 1.0.9-9 ocfs package. Sources are already available on
oss.oracle.com.

Remind to compil ocfs your self securely:

On RedHat 3 and higher remap gcc from 2.96 to 3
On 2.1
Go to /usr/src/linux-[Your-Running-Version]
make mrproper
cp config/kernel-[Your-Running-Version] .config
vi Makefile ----> Insert or correct EXTRAVERSION flag to match your
running kernel.
make oldconfig
make dep
make clean (not strictly necessary).

Then insert your patch in ocfs. And compile it using docs or:

quick step:
./configure --sbindir=/sbin --with-kernel="${KPATH}"  --enable-aio=no
make && make install


Good luck.
May the Tux be with you.


Michael Aubertin
http://linux.ares.fr


Le mardi 20 décembre 2005 à 01:05 -0800, Ivan Wong a écrit :
> Sorry for the non-read friendly email. This week we had another crash,
> the error is below:
> 
> Dec 18 14:16:26 x335-149 kernel: (5694) ERROR: status = -17,
> Common/ocfsgencreate.c, 1671
> Dec 18 14:16:26 x335-149 kernel: (5694) ERROR: status = -17,
> Common/ocfsgencreate.c, 1827
> Dec 18 14:16:26 x335-149 kernel: (5694) ERROR: status = -17,
> Linux/ocfsmain.c, 2090
> Dec 18 14:16:26 x335-149 kernel: (5694) ERROR: status = -17,
> Linux/ocfsmain.c, 2409
> Dec 18 14:16:28 x335-149 kernel: (5695) ERROR: status = -2,
> Common/ocfsgendlm.c, 986
> Dec 18 14:16:28 x335-149 kernel: (5695) ERROR: status = -2,
> Common/ocfsgendlm.c, 1163
> Dec 18 14:16:28 x335-149 kernel: (5695) ERROR: status = -2,
> Common/ocfsgencreate.c, 479
> Dec 18 14:16:28 x335-149 kernel: (5695) ERROR: status = -2,
> Linux/ocfsmain.c, 2030
> Dec 18 16:33:10 x335-149 kernel: (10) ERROR: lockres=null,
> Linux/ocfsmain.c, 3541
> Dec 18 18:20:38 x335-149 kernel: (10) ERROR: lockres=null,
> Linux/ocfsmain.c, 3541
> Dec 19 00:02:30 x335-149 last message repeated 3 times
> Dec 19 00:26:24 x335-149 sshd(pam_unix)[17895]: session opened for user
> oracle by (uid=0)
> Dec 19 00:26:49 x335-149 sshd(pam_unix)[17895]: session closed for user
> oracle
> Dec 19 04:02:05 x335-149 syslogd 1.4.1: restart.
> Dec 19 05:45:03 x335-149 kernel: (26188) ERROR: status = -17,
> Common/ocfsgendirnode.c, 1507
> Dec 19 05:45:03 x335-149 kernel: (26188) ERROR: status = -17,
> Common/ocfsgencreate.c, 1671
> Dec 19 05:45:03 x335-149 kernel: (26188) ERROR: status = -17,
> Common/ocfsgencreate.c, 1827
> Dec 19 05:45:03 x335-149 kernel: (26188) ERROR: status = -17,
> Linux/ocfsmain.c, 2090
> Dec 19 05:45:03 x335-149 kernel: (26188) ERROR: status = -17,
> Linux/ocfsmain.c, 2409
> Dec 19 06:48:08 x335-149 kernel: (10) ERROR: lockres=null,
> Linux/ocfsmain.c, 3541
> Dec 19 07:15:06 x335-149 kernel: Unable to handle kernel NULL pointer
> dereference<3>(3) ERROR: oin has no matching inode!!!!,
> Common/ocfsgencreate.c, 81
> Dec 19 09:00:03 x335-149 syslogd 1.4.1: restart.
> Dec 19 09:00:03 x335-149 syslog: syslogd startup succeeded
> Dec 19 09:00:04 x335-149 kernel: klogd 1.4.1, log source = /proc/kmsg
> started.
> Dec 19 09:00:04 x335-149 syslog: klogd startup succeeded
> 
> 
> Note the part that says : "oin has no matching inode" is particulary
> link to OCFS. Wim? Sunil? Pls anyone advice.
> 
> 
> Thanks / regards,
>  
> Ivan Wong
> Database Administrator
>  
> e2Open Inc. (www.e2open.com) 
> Suite 34.03, Level 34, 
> Menara Citibank,
> 156, Jalan Ampang,
> 50450 Kuala Lumpur, Malaysia
> DID: +603 2776 6397 
> Tel: +603 2776 6300 
> Fax: +603 2712 9112
> 
> 
> -----Original Message-----
> From: ocfs-users-bounces at oss.oracle.com
> [mailto:ocfs-users-bounces at oss.oracle.com] On Behalf Of Ivan Wong
> Sent: Tuesday, December 13, 2005 3:56 PM
> To: ocfs-users at oss.oracle.com
> Subject: [Ocfs-users] Server crashed with
> Common/ocfsgencreate.c,Common/ocfsgenvote.c
> 
> 
> Hi Experts,
> 
> We have a 4nodes RAC running and recently one is down due to hardware
> (fibre optics card) failure. Since running on 3-nodes RAC, the surviving
> server just keep crashing. We cannot figure out why is this happening
> but checking /var/log/messages we have these error (notice the msg
> before crashing at 8:32):
> 
> Dec 12 08:30:45 x335-142 kernel: (2) ERROR: file entry name did not
> match inode, Common/ocfsgencreate.c, 97 Dec 12 08:30:45 x335-142 kernel:
> (2) ERROR: status = -2, Common/ocfsgenvote.c, 121 Dec 12 08:32:28
> x335-142 kernel: (2) ERROR: file entry name did not match inode,
> Common/ocfsgencreate.c, 97 Dec 12 08:32:28 x335-142 kernel: (2) ERROR:
> status = -2, Common/ocfsgenvote.c, 121 Dec 12 08:46:35 x335-142
> sshd(pam_unix)[9468]: session opened for user oracle by (uid=0) Dec 12
> 08:55:30 x335-142 xinetd[11044]: warning: can't get client
> address: Connection reset by peer
> Dec 12 09:59:11 x335-142 sshd(pam_unix)[9468]: session closed for user
> oracle Dec 12 15:15:48 x335-142 kernel: (4) ERROR: file entry name did
> not match inode, Common/ocfsgencreate.c, 97 Dec 12 15:15:48 x335-142
> kernel: (4) ERROR: status = -2, Common/ocfsgenvote.c, 121 Dec 12
> 16:16:10 x335-142 kernel: (3) ERROR: file entry name did not match
> inode, Common/ocfsgencreate.c, 97 Dec 12 16:16:10 x335-142 kernel: (3)
> ERROR: status = -2, Common/ocfsgenvote.c, 121 Dec 12 16:16:20 x335-142
> kernel: (3) ERROR: file entry name did not match inode,
> Common/ocfsgencreate.c, 97 Dec 12 16:16:20 x335-142 kernel: (3) ERROR:
> status = -2, Common/ocfsgenvote.c, 121 Dec 12 16:16:30 x335-142 kernel:
> (3) ERROR: file entry name did not match inode, Common/ocfsgencreate.c,
> 97 Dec 12 16:16:30 x335-142 kernel: (3) ERROR: status = -2,
> Common/ocfsgenvote.c, 121 Dec 12 16:16:35 x335-142 kernel: (3) ERROR:
> file entry name did not match inode, Common/ocfsgencreate.c, 97
> 
> 
> Power cycle the box will allow us to continue starting db, etc. But this
> is the 4th time in two weeks. Since the only error found is ocfs, just
> wondering if anyone have seen this? Or if it is OCFS related.
> 
> Our environment is:
> 
> x335-142:slr142:/e2open/home/oracle: 1000>rpm -qa | grep ocfs
> ocfs-2.4.9-e-smp-1.0.9-9 ocfs-support-1.0.9-9 ocfs-tools-1.0.9-9
> x335-142:slr142:/e2open/home/oracle: 1001>uname -a
> Linux x335-142 2.4.9-e.25smp #1 SMP Fri Jun 6 18:11:40 EDT 2003 i686
> unknown
> 
> Appreciate any feedback.
> 
> 
> 
> Thanks / regards,
>  
> Ivan Wong
> Database Administrator
>  
> e2Open Inc. (www.e2open.com) 
> Suite 34.03, Level 34, Menara Citibank
> 156, Jalan Ampang,
> 50450 Kuala Lumpur, Malaysia
> DID: +603 2776 6397 Tel: +603 2776 6300 Fax: +603 2712 9112
> _______________________________________________
> Ocfs-users mailing list
> Ocfs-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs-users
> _______________________________________________
> Ocfs-users mailing list
> Ocfs-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs-users


****************************************************************************************************
Ce message ou ses pieces jointes peuvent contenir des informations confidentielles a l'intention exclusive de son destinataire et est couvert par le secret professionnel.
Toute utilisation, divulgation ou reproduction de son contenu sont strictement interdits.
Si vous avez recu ce message par erreur, merci de le notifier a son expediteur et d'en detruire toute copie.
Le present message pouvant-etre altere a notre insu, le groupe ARES ne peut pas etre engage par son contenu.
www.ares.fr
****************************************************************************************************


More information about the Ocfs-users mailing list