<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>Re: [Ocfs2-users] PBL with RMAN and ocfs2</TITLE>
<META http-equiv=Content-Type content="text/html; charset=utf-8">
<META content="MSHTML 6.00.2900.3086" name=GENERATOR></HEAD>
<BODY>
<DIV dir=ltr align=left><SPAN class=682283208-14052007><FONT face=Arial
color=#0000ff size=2>Hello <FONT face="Times New Roman"
color=#000000>Dheerendar,</FONT></FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=682283208-14052007><FONT
size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=682283208-14052007><FONT size=2>just a quick
note: "The EMC device have been linked to raw devices " this is IMHO not needed
for Oracle anymore. (if you speak about /dev/rawX). You can use the normal block
devices and Oracle will open them with O_DIRECT flag.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=682283208-14052007><FONT
size=2> </FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=682283208-14052007><FONT size=2>What
filesystem did you specify for CRS? I would suggest to use your own small
partitions.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=682283208-14052007><FONT
size=2></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=682283208-14052007><FONT
size=2>Gruss</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=682283208-14052007><FONT
size=2>Bernd</FONT></SPAN></DIV><BR>
<DIV class=OutlookMessageHeader lang=de dir=ltr align=left>
<HR tabIndex=-1>
<FONT face=Tahoma size=2><B>From:</B> ocfs2-users-bounces@oss.oracle.com
[mailto:ocfs2-users-bounces@oss.oracle.com] <B>On Behalf Of </B>Dheerendar
Srivastav<BR><B>Sent:</B> Sunday, May 13, 2007 6:30 AM<BR><B>To:</B>
lfreitas34@yahoo.com; Ocfs2-users@oss.oracle.com<BR><B>Subject:</B> Re:
[Ocfs2-users] PBL with RMAN and ocfs2<BR></FONT><BR></DIV>
<DIV></DIV><!-- Converted from text/plain format -->
<P><FONT size=2>Dear sir ,<BR><BR>We have used RHEL 4.0 with kernel
2.6.9-42.0.2.ELsmp with ocfs2 , we have oracle 10g 10.1.0.2<BR><BR>I am working
on a RAC installation (10.1.0.2)on RHEL 4.0 with EMC caliriion shared storage .
The EMC device have been linked to raw devices .<BR><BR>We are able to configure
ocfs2 and<BR>ASM . When we will installed CRS the error message show " oracle
cluster Registery can exists only as a shared file system file or as a shared
raw partion .<BR><BR>I would like request how to install the OCRFile
.<BR><BR>Regards<BR>Dheerendar Srivastav<BR>Associate vice President
-IT<BR>Bajaj Capital Ltd.<BR>New Delhi<BR><BR>----- Original Message
-----<BR>From: ocfs2-users-bounces@oss.oracle.com
<ocfs2-users-bounces@oss.oracle.com><BR>To: Ocfs2-users@oss.oracle.com
<Ocfs2-users@oss.oracle.com><BR>Sent: Sat May 12 00:59:19 2007<BR>Subject:
RE: [Ocfs2-users] PBL with RMAN and
ocfs2<BR><BR>Gaetano,<BR><BR> I am using RMAN with
the default configuration here in RH 4.0, but I had to change the I/O scheduler
to the "deadline" I/O scheduler to prevent these reboots, and increased the o2cb
timeouts too. We had some just after implementing but it seems very stable now.
We increased the timeout here to 130, to account for SAN switch failures,
powerpath and such.<BR><BR> I am still on 1.2.1 on the
production nodes and it panics the machine, which do is annoying even when the
servers are on the same building, but there are always messages on
/var/log/messages of the killed node showing what happened. Funny that 1.2.5 no
longer shows these.<BR><BR>Regards,<BR>Luis<BR><BR>Gaetano Giunta
<giunta.gaetano@sea-aeroportimilano.it>
wrote:<BR><BR> Well, I'm not 100% sure
I solved the problem in a definitve way, but here's the complete
story:<BR> <BR>
1 - install, if you can, the latest release of ocfs2 + tools. The fact that a
node reboots instead of panicking (and resting in peace until manual
intervention) is a real life saver if you do not have immediate access to the
server farm. Plus timeouts are
configurable.<BR> <BR>
2 - when a cluster node is rebooted by the ocfs daemon, a telltale message is
present on the console of the node. Messages from the ocfs daemon will also be
present in /var/log/messages on the other nodes, but looking at those it is hard
to understand if the dying node was shutdown by ocfs or by other
causes.<BR> <BR>
You can either sit in front of the screen or start the netdump service on the
rebooting node and the netdump-server service on a spare machine (another node
on the cluster is fine. For best results use a different nic interconnect from
the one used by ocfs.) If you are using red-hat the man pages for both services
are quite
straightforward<BR> <BR>
3 - in our case, the log we netdumped
said:<BR>
(6,0):o2hb_write_timeout:269 ERROR:
Heartbeat write timeout to device emcpowere2 after 12000
milliseconds<BR> Heartbeat thread (6)
printing last 24 blocking operations (cur =
7):<BR> Heartbeat thread stuck at
waiting for read completion, stuffing current time into that blocker (index
7)<BR> Index 8: took 0 ms to do
submit_bio for read<BR> [ ...
]<BR> Index
7:<BR> took 9998 ms to do
waiting for read completion<BR> ***
ocfs2 is very sorry to be fencing this system by restarting
***<BR> 4 - thus we determined ocfs2
was indeed at fault. Operations on other files where ok, but using rman to
create a single 1,3 GB file on the ocfs disk was somehow triggering an heartbeat
timeout.<BR> <BR>
5 - we modified the configuration of our rman scripts to try to keep the size of
the files created smaller. We tested again, and there was no reboot. I am not
sure you can achieve the same result for failovers though - the general idea is
to keep io in smaller chunks (or slow it down
somehow?)<BR> <BR>
6- As Sunil recommended (sorry, I think this was off list), we also raised the
ocfs timeout value for O2CB_HEARTBEAT_THRESHOLD. Precise instructions for that
can be found here: <A
href="http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.html#TIMEOUT">http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.html#TIMEOUT</A>.
We decided to go with a value of 31. We did not raise timeouts for the network
keepalives (yet), since we are not using bonded nics for the ocfs2 interconnect.
We might do that in the future if we find out that traffic on that network is
extremely high / the network unstable,
though...<BR> <BR>
Hope it helps<BR>
Gaetano<BR> <BR> <BR>
-----Original
Message-----<BR>
From: Mattias Segerdahl [<A
href="mailto:mattias.segerdahl@mandator.com">mailto:mattias.segerdahl@mandator.com</A>]<BR>
Sent: Friday, May 11, 2007 10:00
AM<BR>
To: Gaetano
Giunta<BR>
Subject: RE: [Ocfs2-users] PBL with
RMAN and ocfs2<BR>
<BR>
<BR>
Hi,<BR>
<BR>
We’re having the exact same problem,
if we do a failover between two filers/san’s, the server
reboots.<BR>
<BR>
So far I haven’t found a solution to
the problem, would you mind trying to explain how you solved the problem, step
by step?<BR>
<BR>
Best
Regards,<BR>
<BR>
Mattias
Segerdahl<BR>
<BR>
From:
ocfs2-users-bounces@oss.oracle.com [<A
href="mailto:ocfs2-users-bounces@oss.oracle.com">mailto:ocfs2-users-bounces@oss.oracle.com</A>]
On Behalf Of Gaetano Giunta<BR>
Sent: den 11 maj 2007
09:47<BR>
To:
Ocfs2-users@oss.oracle.com<BR>
Subject: RE: [Ocfs2-users] PBL with
RMAN and ocfs2<BR>
<BR>
Thanks, but I had alreday checked out
all logs I could find (oracle and crs alerts, /var/log stuff) and there was no
clear indication in there.<BR>
<BR>
The trick is the ocfs was sending the
alert message to the console only (I wonder why it does not also leva traces
into syslog, my best guess is it tries to shutdown as fast as it can, and
sending a message to console is faster than sending it to syslog - but I'm in no
way a linux guru...).<BR>
<BR>
By using the netdump tool suggested
by Sunil I managed to see the console messages of the dying node (without having
to phisycally be in the server farm, which is 40 km away from my ususal
workplace), and diagnosed the ocfs2 heartbeat as "the
killer".<BR>
<BR>
Bye<BR>
Gaetano<BR><BR>
-----Original
Message-----<BR>
From: Luis Freitas [<A
href="mailto:lfreitas34@yahoo.com">mailto:lfreitas34@yahoo.com</A>]<BR>
Sent: Thursday, May 10, 2007 11:17
PM<BR>
To: Gaetano
Giunta<BR>
Cc:
Ocfs2-users@oss.oracle.com<BR>
Subject: Re: [Ocfs2-users] PBL with
RMAN and ocfs2<BR>
Gaetano,<BR>
<BR>
If o2cb or CRS is
killing the machine, it usually shows on /var/log/messages with lines explaining
what happened. Take a look on the /var/log/messages just before the last
"syslogd x.x.x: restart".<BR>
<BR>
Regards,<BR>
Luis<BR><BR><BR><BR>
Gaetano Giunta
wrote:<BR>
>
Hello.<BR>
><BR>
> On a 2 node RAC 10.2.0.3 setup,
on RH ES 4.4 x86_64, with ocfs 1.2.5-1, we are experiencing some troubles with
RMAN: when the archive log destination is on an ASM partition, and the backup
detsination is on ocfs2, running<BR>
><BR>
> backup archivelog all format
'/home/SANstorage/oracle/backup/rman/dump_log/FULL_20070509_154916/arc_%d_%u'
delete input;<BR>
><BR>
> consistently causes a
reboot.<BR>
><BR>
> The rman catalog is clean, and
has been crosschecked in every
way.<BR>
><BR>
> We tried on both nodes, and the
node executing the backup always
reboots.<BR>
> I am thus inclined to think that
it is not the ocfs2 dlm that triggers the reboot, because in that case the
victim would always be the second
node.<BR>
><BR>
> I also tested the same command
using as backup destination /tmp, and all was fine. The backup file of the
archived logs is 1249843712 in
size.<BR>
><BR>
> Our local oracle guy went
through metalink and said there is no open bug/patch for that at this
time.<BR>
><BR>
> Any suggestions
???<BR>
><BR>
>
Thanks<BR>
> Gaetano
Giunta<BR>
><BR>
><BR>
>
------------------------------------------------------------------------<BR>
><BR>
>
_______________________________________________<BR>
> Ocfs2-users mailing
list<BR>
>
Ocfs2-users@oss.oracle.com<BR>
> <A
href="http://oss.oracle.com/mailman/listinfo/ocfs2-users">http://oss.oracle.com/mailman/listinfo/ocfs2-users</A><BR>
<BR>
<BR>
_______________________________________________<BR>
Ocfs2-users mailing
list<BR>
Ocfs2-users@oss.oracle.com<BR>
<A
href="http://oss.oracle.com/mailman/listinfo/ocfs2-users">http://oss.oracle.com/mailman/listinfo/ocfs2-users</A><BR>
<BR>________________________________<BR><BR>
Ahhh...imagining that irresistible
"new car" smell?<BR>
Check out new cars at Yahoo! Autos.
<<A
href="http://us.rd.yahoo.com/evt=48245/*http:/autos.yahoo.com/new_cars.html;_ylc=X3oDMTE1YW1jcXJ2BF9TAzk3MTA3MDc2BHNlYwNtYWlsdGFncwRzbGsDbmV3LWNhcnM-">http://us.rd.yahoo.com/evt=48245/*http:/autos.yahoo.com/new_cars.html;_ylc=X3oDMTE1YW1jcXJ2BF9TAzk3MTA3MDc2BHNlYwNtYWlsdGFncwRzbGsDbmV3LWNhcnM-</A>> <BR><BR>
_______________________________________________<BR>
Ocfs2-users mailing list<BR>
Ocfs2-users@oss.oracle.com<BR> <A
href="http://oss.oracle.com/mailman/listinfo/ocfs2-users">http://oss.oracle.com/mailman/listinfo/ocfs2-users</A><BR><BR><BR>________________________________<BR><BR>Pinpoint
customers <<A
href="http://us.rd.yahoo.com/evt=48250/*http://searchmarketing.yahoo.com/arp/sponsoredsearch_v9.php?o=US2226&cmp=Yahoo&ctv=AprNI&s=Y&s2=EM&b=50">http://us.rd.yahoo.com/evt=48250/*http://searchmarketing.yahoo.com/arp/sponsoredsearch_v9.php?o=US2226&cmp=Yahoo&ctv=AprNI&s=Y&s2=EM&b=50</A>>
who are looking for what you sell.<BR></FONT></P>pFQBד5DIb<--1DA
<4z֖˲+jjWjnzwjםwfݡ'ުzȠnǦj)b+uǞjwfk'(֢WjYrWy֧u'~'^ؚ)ߢ*'!azu~bg^{^ם'ޚh稭'&y+ub
azgazzܨqvZv+,zhjب))ࢷbnbzrxjب+)bn+^tz{rZqya(kƭ뢺eyiv&z\؟(~^jw]zWzb&qbwl2צf&qbvzz-.uƩi^zbw(*'jw]zV
wi')+aa隊Vqm^+$j)\i'*'ݶ'zw.jzםj{aymifz{lʉvy˫ܦy.+-jwW(֜1٢+-bpޥ]yׯwMvڊޙi٢j7+Z&Yajy2*.(ڶ*'ڌ&֥.)
<DIV><P><HR>
SEEBURGER AG <BR>
Headquarters:<BR>
Edisonstraße 1 <BR>
D-75015 Bretten <BR>
Tel.: 0 72 52/96-0 <BR>
Fax: 0 72 52/96-2222 <BR>
Internet: http://www.seeburger.de <BR>
e-mail: info@seeburger.de <BR>
<BR>
Vorstand:<BR>
Bernd Seeburger, Axel Haas, Michael Kleeberg<BR>
<BR>
Vorsitzender des Aufsichtsrats:<BR>
Dr. Franz Scherer<BR>
<BR>
Handelsregister:<BR>
HRB 240708 Mannheim
</P></DIV>
</BODY></HTML>