[Ocfs-users] Major RAC slowdown

Derek Suzuki DSuzuki at ZipRealty.com
Tue Jun 8 10:40:40 CDT 2004


 Correct - normally %iowait dominates, but in this circumstance it's all %sys.  Most of the Oracle processes spend a lot of time spinning through gettimeofday() calls, but in between there are the usual reads and writes to disk or the interconnect socket (depending on which process it is).


-----Original Message-----
From: Wim Coekaerts [mailto:wim.coekaerts at oracle.com]
Sent: Tue 6/8/2004 9:23 AM
To: Derek Suzuki
Cc: ocfs-users at oss.oracle.com
Subject: Re: [Ocfs-users] Major RAC slowdown
 
what sort of syscalls ? and I guess that means you see a lot of %sys not
%user ... hmm

On Tue, Jun 08, 2004 at 01:22:44AM -0700, Derek Suzuki wrote:
> Hello again.  Our production cluster has begun experiencing some vicious slowdowns that may (or may not) be related to the filesystems.  When the problem occurs, the load average on the servers jumps up to 30 or higher.  Usually one node will climb while the other drops, then they will switch places a few minutes later.  At one point, we had one node's load average up over 300.  Our site activity has been on the rise, and the problems usually occur during peak mid-day hours.
>  
> Under normal conditions, "top" shows the CPUs spending most of their time waiting on the very busy fibre channel.  During the slowdowns, the processors are mostly busy with system calls.  Traffic over both the fibre channel and gigabit interconnect seems to drop off considerably at the same time.
>  
> I've got a TAR open, but the support people are still in the very preliminary stages (for example, we just installed a switch between the two nodes because a crossover cable is apparently not supported).  There doesn't seem to be any good indication of what's going on.  We suspected the interconnect, but the private interfaces seem to behave normally while Oracle is grinding to a halt.
>  
> After 10-30 minutes, the problem will fade away on its own.  I'm inclined to blame something in the RAC inter-node communications code, but I was wondering if this situation resembled any kind of OCFS problem anyone has seen.  These servers are still on 1.0.9-12, with plans to go to 1.0.12 soon after this issue is resolved.
>  
> Derek

> _______________________________________________
> Ocfs-users mailing list
> Ocfs-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs-users




-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs-users/attachments/20040608/d5048b69/attachment.htm


More information about the Ocfs-users mailing list