<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">

<HTML>

<HEAD>

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">

<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.6944.0">

<TITLE>RE: [Ocfs-users] Major RAC slowdown</TITLE>

</HEAD>

<BODY>

<!-- Converted from text/plain format -->


<P><FONT SIZE=2>&nbsp;Correct - normally %iowait dominates, but in this circumstance it's all %sys.&nbsp; Most of the Oracle processes spend a lot of time spinning through gettimeofday() calls, but in between there are the usual reads and writes to disk or the interconnect socket (depending on which process it is).<BR>

<BR>

<BR>

-----Original Message-----<BR>

From: Wim Coekaerts [<A HREF="mailto:wim.coekaerts@oracle.com">mailto:wim.coekaerts@oracle.com</A>]<BR>

Sent: Tue 6/8/2004 9:23 AM<BR>

To: Derek Suzuki<BR>

Cc: ocfs-users@oss.oracle.com<BR>

Subject: Re: [Ocfs-users] Major RAC slowdown<BR>

<BR>

what sort of syscalls ? and I guess that means you see a lot of %sys not<BR>

%user ... hmm<BR>

<BR>

On Tue, Jun 08, 2004 at 01:22:44AM -0700, Derek Suzuki wrote:<BR>

&gt; Hello again.&nbsp; Our production cluster has begun experiencing some vicious slowdowns that may (or may not) be related to the filesystems.&nbsp; When the problem occurs, the load average on the servers jumps up to 30 or higher.&nbsp; Usually one node will climb while the other drops, then they will switch places a few minutes later.&nbsp; At one point, we had one node's load average up over 300.&nbsp; Our site activity has been on the rise, and the problems usually occur during peak mid-day hours.<BR>

&gt;&nbsp;<BR>

&gt; Under normal conditions, &quot;top&quot; shows the CPUs spending most of their time waiting on the very busy fibre channel.&nbsp; During the slowdowns, the processors are mostly busy with system calls.&nbsp; Traffic over both the fibre channel and gigabit interconnect seems to drop off considerably at the same time.<BR>

&gt;&nbsp;<BR>

&gt; I've got a TAR open, but the support people are still in the very preliminary stages (for example, we just installed a switch between the two nodes because a crossover cable is apparently not supported).&nbsp; There doesn't seem to be any good indication of what's going on.&nbsp; We suspected the interconnect, but the private interfaces seem to behave normally while Oracle is grinding to a halt.<BR>

&gt;&nbsp;<BR>

&gt; After 10-30 minutes, the problem will fade away on its own.&nbsp; I'm inclined to blame something in the RAC inter-node communications code, but I was wondering if this situation resembled any kind of OCFS problem anyone has seen.&nbsp; These servers are still on 1.0.9-12, with plans to go to 1.0.12 soon after this issue is resolved.<BR>

&gt;&nbsp;<BR>

&gt; Derek<BR>

<BR>

&gt; _______________________________________________<BR>

&gt; Ocfs-users mailing list<BR>

&gt; Ocfs-users@oss.oracle.com<BR>

&gt; <A HREF="http://oss.oracle.com/mailman/listinfo/ocfs-users">http://oss.oracle.com/mailman/listinfo/ocfs-users</A><BR>

<BR>

<BR>

<BR>

</FONT>

</P>


</BODY>

</HTML>