<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD>

<META http-equiv=Content-Type content="text/html; charset=us-ascii">

<META content="MSHTML 6.00.6000.16674" name=GENERATOR></HEAD>

<BODY>

<DIV><SPAN class=585374012-10072008><FONT face=Arial size=2>We are using OCFS2 

on Red Hat Enterprise Linux&nbsp;to share regular UNIX filesystems between three 

web servers and this seems to work reasonably well except that we frequently 

experience high load situations on our servers (all at the same 

time).</FONT></SPAN></DIV>

<DIV><SPAN class=585374012-10072008><FONT face=Arial 

size=2></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=585374012-10072008><FONT face=Arial size=2>The underlying SAN 

is an HP EVA 4100 and the SAN diagnostics show that the SAN itself is coping 

fairly easily; CPU loads on the controllers rarely rise above 5% but usually 

steady at around 1%.&nbsp; The number of read requests per second is usually 

around&nbsp;500-600 although this does peak occasionally at 

2000-3000.&nbsp;&nbsp;On one occasion&nbsp;the number of requests per 

second&nbsp;went over 12,000 with an average of 52MB/s transferred; the SAN 

coped with this with an average latency of 0.1ms.&nbsp; Write requests rarely go 

higher than 20-30 per second but have been known to hit 2500 during busy 

periods.</FONT></SPAN></DIV>

<DIV><SPAN class=585374012-10072008><FONT face=Arial 

size=2></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=585374012-10072008><FONT face=Arial size=2>Because we are 

web-serving, this results in lots of read requests for small files for the 

majority of the time but there are periods in the day when we update 

many&nbsp;thousands of images and it is at times like this when we are doing a 

relatively high volume of writes that the load shoots up.&nbsp; We recently has 

a situation where the&nbsp;1 minute load average hit 650.&nbsp; Normally when we 

hit high load situations, the load seems to be higher on the nodes that aren't 

writing to the SAN than on the one that is.</FONT></SPAN></DIV>

<DIV><SPAN class=585374012-10072008><FONT face=Arial 

size=2></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=585374012-10072008><FONT face=Arial size=2>Running 'iostat -x' 

frequently shows output similar to that below:<BR><BR><FONT 

face="Courier New">Time: 08:05:48 

AM<BR>Device:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; rrqm/s&nbsp;&nbsp; 

wrqm/s&nbsp;&nbsp; r/s&nbsp;&nbsp; w/s&nbsp;&nbsp; rsec/s&nbsp;&nbsp; wsec/s 

avgrq-sz avgqu-sz&nbsp;&nbsp; await&nbsp; svctm&nbsp; 

%util<BR>sda&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

46.40&nbsp;&nbsp; 255.80 250.40 28.00&nbsp; 2373.60&nbsp; 

2267.60&nbsp;&nbsp;&nbsp; 16.67&nbsp;&nbsp;&nbsp;&nbsp; 1.44&nbsp;&nbsp;&nbsp; 

5.16&nbsp;&nbsp; 3.24&nbsp; 90.26</FONT></FONT></SPAN></DIV>

<DIV><FONT face="Courier New"></FONT>&nbsp;</DIV>

<DIV><SPAN class=585374012-10072008><FONT face="Courier New" size=2>Time: 

08:05:53 AM<BR>Device:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

rrqm/s&nbsp;&nbsp; wrqm/s&nbsp;&nbsp; r/s&nbsp;&nbsp; w/s&nbsp;&nbsp; 

rsec/s&nbsp;&nbsp; wsec/s avgrq-sz avgqu-sz&nbsp;&nbsp; await&nbsp; svctm&nbsp; 

%util<BR>sda&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

18.47&nbsp;&nbsp; 104.22 221.49 232.93&nbsp; 1902.41&nbsp; 

2699.40&nbsp;&nbsp;&nbsp; 10.13&nbsp;&nbsp;&nbsp;&nbsp; 2.68&nbsp;&nbsp;&nbsp; 

5.78&nbsp;&nbsp; 2.00&nbsp; 90.82</FONT></SPAN></DIV>

<DIV><FONT face="Courier New"></FONT>&nbsp;</DIV>

<DIV><SPAN class=585374012-10072008><FONT face="Courier New" size=2>Time: 

08:05:58 AM<BR>Device:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

rrqm/s&nbsp;&nbsp; wrqm/s&nbsp;&nbsp; r/s&nbsp;&nbsp; w/s&nbsp;&nbsp; 

rsec/s&nbsp;&nbsp; wsec/s avgrq-sz avgqu-sz&nbsp;&nbsp; await&nbsp; svctm&nbsp; 

%util<BR>sda&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

28.60&nbsp;&nbsp;&nbsp; 44.20 47.40&nbsp; 2.40&nbsp;&nbsp; 

613.60&nbsp;&nbsp;&nbsp; 98.00&nbsp;&nbsp;&nbsp; 14.29&nbsp;&nbsp;&nbsp;&nbsp; 

5.40&nbsp; 104.24&nbsp; 20.09 100.04</FONT></SPAN></DIV>

<DIV><FONT face="Courier New"></FONT>&nbsp;</DIV>

<DIV><SPAN class=585374012-10072008><FONT face="Courier New" size=2>Time: 

08:06:03 AM<BR>Device:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

rrqm/s&nbsp;&nbsp; wrqm/s&nbsp;&nbsp; r/s&nbsp;&nbsp; w/s&nbsp;&nbsp; 

rsec/s&nbsp;&nbsp; wsec/s avgrq-sz avgqu-sz&nbsp;&nbsp; await&nbsp; svctm&nbsp; 

%util<BR>sda&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 

38.80&nbsp;&nbsp;&nbsp; 10.20 149.80&nbsp; 3.00&nbsp; 1517.20&nbsp;&nbsp; 

373.40&nbsp;&nbsp;&nbsp; 12.37&nbsp;&nbsp;&nbsp;&nbsp; 3.71&nbsp;&nbsp; 

25.98&nbsp;&nbsp; 6.34&nbsp; 96.86</FONT></SPAN></DIV>

<DIV><FONT face=Arial size=2></FONT>&nbsp;</DIV>

<DIV><SPAN class=585374012-10072008><FONT face=Arial size=2>This situation goes 

on for several minutes or tens of minutes before things calm down again.&nbsp; 

The number of reads and writes per second don't seem&nbsp;to be 

unreasonably&nbsp;high to me, especially considering the underlying SAN 

performance but it is clear that the device is totally saturated with elevated 

figures for await and svctm.&nbsp; The 1 minute load averages at these times is 

typically 60-80 and so it seems clear to me that we are running out of 

steam.&nbsp; Oddly, vmstat doesn't indicate blocked processes or waiting on I/O 

during these times.&nbsp; CPU utilization is never more than about 20% (the 

servers were considerably over-spec'ed with 4 quad-core Xeon processors in each 

and 8GB RAM.&nbsp; There is no swapping taking place either.&nbsp; BTW, I have 

mounted the SAN-based filesystems with noatime and this has helped a 

little.</FONT></SPAN></DIV>

<DIV><SPAN class=585374012-10072008><FONT face=Arial 

size=2></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=585374012-10072008><FONT face=Arial size=2>It occurred to me 

that maybe it was the inter-node communications to negotiate locks that may be 

slowing things down but the servers are connected&nbsp;via Gigabit NICs and all 

are on the same subnet so network switches can be ruled out.</FONT></SPAN></DIV>

<DIV><SPAN class=585374012-10072008><FONT face=Arial 

size=2></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=585374012-10072008><FONT face=Arial size=2>My question is: is 

OCFS2 (or any clustered filesystem for that matter) suitable for the demands 

that we are placing on it?&nbsp; Has anyone any experience of using OCFS2 under 

what I assume to be 'extreme' conditions?</FONT></SPAN></DIV>

<DIV><SPAN class=585374012-10072008><FONT face=Arial 

size=2></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=585374012-10072008><FONT face=Arial size=2>Any advice would be 

greatly appreciated - this is causing me untold grief due to poor response times 

from the web servers when this is happening.&nbsp; On a few occasions, when the 

load was particularly high, one or other of the web servers have fenced and 

rebooted themselves.</FONT></SPAN></DIV>

<DIV><SPAN class=585374012-10072008><FONT face=Arial 

size=2></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=585374012-10072008><FONT face=Arial 

size=2>Regards,</FONT></SPAN></DIV>

<DIV><SPAN class=585374012-10072008><FONT face=Arial 

size=2></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=585374012-10072008><FONT face=Arial 

size=2>Mick.</FONT></SPAN></DIV>

<DIV><SPAN class=585374012-10072008></SPAN>&nbsp;</DIV>

<DIV><SPAN class=585374012-10072008><FONT face=Arial 

size=2></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=585374012-10072008><FONT face=Arial 

size=2></FONT></SPAN>&nbsp;</DIV>

<DIV><SPAN class=585374012-10072008><FONT face=Arial 

size=2></FONT></SPAN>&nbsp;</DIV></BODY></HTML>