[Ocfs2-users] Slow on open()

Brad Plant bplant at iinet.net.au
Thu Jan 28 01:31:30 PST 2010


Hi Somsak,

I observed high loads and apache slowness when there was insufficient contiguous free space due to fragmentation. I believe it was because apache couldn't write it's log files efficiently. We had 2 apache nodes and I found that stopping apache on the problem node resolved the problem until I deleted lots of unused files.

My symptoms don't seem to suggest a slow open() syscall like your strace results are showing, but I certainly got the high load and poor apache performance. It might be worth checking out anyway. There is a bug report and we're just waiting for the patch to get reviewed and made publicly available.

http://oss.oracle.com/bugzilla/show_bug.cgi?id=1189

Cheers,

Brad



On Tue, 19 Jan 2010 16:12:07 +0700
Somsak Sriprayoonsakul <somsaks at gmail.com> wrote:

> Hello,
> 
> We are using OCFS2 version 1.4.3 on CentOS5, x86_64 with 8GB memory. The
> underlying storage is HP 2312fc smart array equipped with 12 SAS 15K rpm,
> configured as RAID10 using 10 HDDs + 2 spares. The array has about 4GB
> cache. Communication is 4Gbps FC, through HP StorageWorks 8/8 Base e-port
> SAN Switch. Right now we only have this machine connect to the SAN through
> switch, but we plan to add more machine to utilize this SAN system.
> 
> Our application is apache version 1.3.41, mostly serving static HTML file +
> few PHP. Note that, we have to downgrade to 1.3.41 due to our application
> requirement. Apache is configured on has 500 MaxClients.
> 
> The storage OCFS2 are formatted with mkfs.ocfs2 without any special option
> on. It run directly from multipath'ed SAN storage without LVM or software
> RAID. We mount OCFS2 with noatime, commit=15, and data=writeback (as well as
> heartbeat=local). Our cluster.conf is like this
> 
> cluster:
>     node_count = 1
>     name = mycluster
> 
> node:
>     ip_port = 7777
>     ip_address = 203.123.123.123
>     number = 1
>     name = mycluster.mydomain.com
>     cluster = mycluster
> 
> (NOTE: Some details are neglected here, such as hostname and IP address).
> 
> Periodically, we found that the file system work very slow. I think that it
> happened once every few minutes. When the file system slow, httpd process
> CPU utilization will goes much higher to about 50% or above. I tried to
> debug this slow by creating a small script that periodically do
> 
> strace -f dd if=/dev/zero of=/san/testfile bs=1k count=1
> 
> And time the speed of dd, usually dd will finish within subsecond, but
> periodically dd will be much slower to about 30-60 seconds. Strace output
> show this.
> 
>      0.000026 open("/san/testfile", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 1
>     76.418696 rt_sigaction(SIGUSR1, NULL, {SIG_DFL, [], 0}, 8) = 0
> 
> So I presume that this mean the open system call is periodically very slow.
> I did about 5-10 tests which yield similar strace'd results (ranging from
> just 5-7 seconds to 80 seconds).
> 
> So my question is, what could be the cause of this slowness? How could I
> debug this deeper? On which point should we optimize the file system?
> 
> We are in the process of purchasing and adding more web servers to the
> system and use reverse proxy to load balance between two servers. We just
> want to make sure that this will not make situation worst.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
Url : http://oss.oracle.com/pipermail/ocfs2-users/attachments/20100128/18a08c83/attachment.bin 


More information about the Ocfs2-users mailing list