[Ocfs2-users] heartbeat write timeout

Diane Petersen diane_petersen at yahoo.com
Tue Apr 18 11:28:32 CDT 2006


I also modified elevator=deadline but didn't see any change in fencing behavior until increasing O2CB_HEARTBEAT_THRESHOLD  to 16 (30 second timeout). 

The issue we were seeing, fencing at precisely 5:15pm every Saturday but we couldn't trace the problem to any specific event or activity occurring at that time. However, we created test a job that was very write intensive to the ocfs2 partition and were then able to crash the nodes at will every time we ran this job. After making the above change to the THRESHOLD neither one of the nodes has fenced/crashed since. It's now been several weeks since making this change.

Configuration: 2 node RAC cluster, EMC shared storage, Linux x86-64 RH4 update 2, OCFS2, 10.2.0.2 database standard edition.

Diane Petersen
Sr. Oracle DBA
ServerCare, Inc.

"Weller, Michael" <michael.weller at itz-essen.de> wrote: I don't know if I mentioned that to the list, elevator=deadline and rising the THRESHOLD to 14 solved my self-fencing issues.

(We'll see what happens under a possibly extreme load).

Michael.

---

Dr. Michael Weller

ITZ Informationstechnologie GmbH
Consulting/Systemengineering
Bismarckstrasse 57
D-45128 Essen

Phone Office  +49 201 24714 28
FAX   Office  +49 201 24714 33
Phone Mobile  +49 172 2178078
E-Mail        mailto:michael.weller at itz-essen.de

> -----Urspr�ngliche Nachricht-----
> Von: ocfs2-users-bounces at oss.oracle.com [mailto:ocfs2-users-
> bounces at oss.oracle.com] Im Auftrag von Zunker, Christian
> Gesendet: Dienstag, 18. April 2006 15:21
> An: ocfs2-users at oss.oracle.com
> Betreff: Re: [Ocfs2-users] heartbeat write timeout
> 
> Hi,
> 
> I experienced the same problems. The elevator=deadline parameter didn't
> help. But increasing the threshold to 60 did it. I think you could
> decrease the threshold, but didn't test it. In another posting, it is said
> to take a timeout between 60 and 90 seconds. This would mean a threshold
> between 31 and 46.
> 
> I'll test this later.
> 
> Best regards,
> Christian
> 
> 
> -----Urspr�ngliche Nachricht-----
> Von: ocfs2-users-bounces at oss.oracle.com [mailto:ocfs2-users-
> bounces at oss.oracle.com] Im Auftrag von Weller, Michael
> Gesendet: Sonntag, 2. April 2006 14:18
> An: Silviu Marin-Caea; ocfs2-users at oss.oracle.com
> Betreff: Re: [Ocfs2-users] heartbeat write timeout
> 
> Thx for the hints, I'll try that.
> 
> With regards to the updates, while I generally agree, I can't update the
> kernel here, because we'll loose vendor warranty in that case. I know this
> is an odd concept, but that's how it works. We'll even loose Oracle
> support because the kernel update would void HP SAN-support.
> 
> I mentioned SAN Failover, which for example does not work with current
> kernel and current (even the not so current HP checked variant) Qlogic
> driver.
> 
> Anyway, I'll try your suggestions on monday and drop the list a note if it
> worked.
> 
> Thanks,
> Michael.
> 
>  ---
> 
> Dr. Michael Weller
> 
> ITZ Informationstechnologie GmbH
> Consulting/Systemengineering
> Bismarckstrasse 57
> D-45128 Essen
> 
> Phone Office    +49 201 24714 28
> FAX   Office    +49 201 24714 33
> Phone Mobile    +49 172 2178078
> E-Mail          mailto:michael.weller at itz-essen.de
> > -----Urspr�ngliche Nachricht-----
> > Von: ocfs2-users-bounces at oss.oracle.com [mailto:ocfs2-users-
> > bounces at oss.oracle.com] Im Auftrag von Silviu Marin-Caea
> > Gesendet: Sonntag, 2. April 2006 08:26
> > An: ocfs2-users at oss.oracle.com
> > Betreff: Re: [Ocfs2-users] heartbeat write timeout
> >
> > On Saturday 01 April 2006 22:36, Weller, Michael wrote:
> >
> > > we are bound to SLES9SP3 (and EXACTLY that, nothing less, not a patch
> > > more)
> >
> > Having latest updates does not hurt, on the contrary, it helps.  For
> > example,
> > the latest kernel has OCFS2 1.1.8, while the kernel from SP3 has 1.1.7.
> > There are a number of bugfixes.
> >
> > SLES updates do really have a purpose.  Apply them after testing in a
> > non-production system.
> >
> > > It locks up immediately. Definitely nothing like a 12s timeout
> expires.
> >
> > It just looks like it's immediate, actually, the 12s do expire.
> >
> > > You mention a FAQ regarding some config option which I didn't come
> > > across up to now, where can I find it?
> >
> > /boot/grub/menu.lst
> >
> > change elevator=cfq to elevator=deadline
> >
> > http://oss.oracle.com/projects/ocfs2/
> > scroll down, look at the red text
> >
> > > Which options would you recommend to fix the problem or at least make
> > > locks much less likely.
> >
> > You could also increase the timeout:
> >
> > /etc/sysconfig/o2cb
> >
> > # O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead.
> > O2CB_HEARTBEAT_THRESHOLD=16
> >
> >
> > _______________________________________________
> > Ocfs2-users mailing list
> > Ocfs2-users at oss.oracle.com
> > http://oss.oracle.com/mailman/listinfo/ocfs2-users
> 
> 
> 
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users

BEGIN:VCARD
VERSION:2.1
N:Weller;Michael;;Dr.
FN:Michael Weller
ORG:ITZ Informationstechnologie GmbH;System Engineering. Internet Security, VPN, IP-Routing, Switching, Unix, Linux
TITLE:Senior Consultant
NOTE;ENCODING=QUOTED-PRINTABLE:=0D=0A
TEL;WORK;VOICE:+49 2012471428
TEL;CELL;VOICE:+49 1722178078
TEL;WORK;FAX:+49 201 2471433
ADR;WORK:;;Bismarckstra�e 57;Essen;;45128;Deutschland
LABEL;WORK;ENCODING=QUOTED-PRINTABLE:Bismarckstra=DFe 57=0D=0AEssen 45128=0D=0ADeutschland
EMAIL;PREF;INTERNET:michael.weller at itz-essen.de
REV:20050221T135645Z
END:VCARD
_______________________________________________
Ocfs2-users mailing list
Ocfs2-users at oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users


		
---------------------------------
Yahoo! Messenger with Voice. Make PC-to-Phone Calls to the US (and 30+ countries) for 2¢/min or less.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20060418/19bdbe50/attachment-0001.html


More information about the Ocfs2-users mailing list