<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
The cluster stack uses the interconnect to negotiate the locks.<br>
That's how it is able to provide data coherency. Other solutions<br>
do not provide that kind of coherency.<br>
<br>
If you are referring to interconnect speeds in ms, it is not good.<br>
That unit is typically used for disk access.<br>
<br>
On 08/19/2011 01:30 PM, Nick Geron wrote:
<blockquote
cite="mid:8F348D9D79ADA24993FCE45FE44F162402712018903B@exchange01.corp.corenap.com"
type="cite">
<meta http-equiv="Content-Type" content="text/html;
charset=ISO-8859-1">
<meta name="Generator" content="Microsoft Word 12 (filtered
medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
        {font-family:Consolas;
        panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri","sans-serif";
        color:black;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
pre
        {mso-style-priority:99;
        mso-style-link:"HTML Preformatted Char";
        margin:0in;
        margin-bottom:.0001pt;
        font-size:10.0pt;
        font-family:"Courier New";
        color:black;}
span.HTMLPreformattedChar
        {mso-style-name:"HTML Preformatted Char";
        mso-style-priority:99;
        mso-style-link:"HTML Preformatted";
        font-family:Consolas;
        color:black;}
span.EmailStyle19
        {mso-style-type:personal;
        font-family:"Calibri","sans-serif";
        color:windowtext;}
span.EmailStyle20
        {mso-style-type:personal;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
span.EmailStyle21
        {mso-style-type:personal-reply;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
<div class="WordSection1">
<p class="MsoNormal"><span style="color: rgb(31, 73, 125);">Actually
those first numbers were from GigE links going out to
physical switches and back in. To optimize the private
link, I upgraded the VMs NICs to 10GE (VMXNet3 which is the
VMware para virt driver), moved them onto the same host
system with a dedicated software switch between them. The
numbers only improved slightly, and got worse on 1 of 100
pings (1ms). <o:p></o:p></span></p>
<p class="MsoNormal"><span style="color: rgb(31, 73, 125);"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color: rgb(31, 73, 125);">10GE
between VMs under the same hypervisor: rtt min/avg/max/mdev
= 0.194/0.307/1.003/0.132 ms<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color: rgb(31, 73, 125);"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color: rgb(31, 73, 125);">What
I don’t understand is why my OCFS2 cluster suffers so
greatly. There’s quite a big difference between wall time
of 0.17 seconds to traverse the data on an iSCSI link and
the 4 minutes to do the same on OCFS2 with a sub 1ms average
latent private interconnect. For that matter, the whole
setup is running on another clustered FS (VMFS3) over the
same network to the same SAN. I guess I’m just a little
dumbfounded that OCFS2 is so much more demanding than other
clustered FSs and alternative network storage options.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color: rgb(31, 73, 125);"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color: rgb(31, 73, 125);">Is
the network really the most likely candidate? If so, is
anyone else running OCFS2 from within a VM environment? Is
this technology only worthwhile in the physical world? Is
there a sweet spot for network latency that I should strive
for? The user guide only makes mention of ‘low latency’ but
lacks figures save for heartbeat and timeouts.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color: rgb(31, 73, 125);"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="color: rgb(31, 73, 125);">-nick<o:p></o:p></span></p>
<p class="MsoNormal"><span style="color: rgb(31, 73, 125);"><o:p> </o:p></span></p>
<div>
<div style="border-right: medium none; border-width: 1pt
medium medium; border-style: solid none none; border-color:
rgb(181, 196, 223) -moz-use-text-color -moz-use-text-color;
padding: 3pt 0in 0in;">
<p class="MsoNormal"><b><span style="font-size: 10pt;
font-family:
"Tahoma","sans-serif"; color:
windowtext;">From:</span></b><span style="font-size:
10pt; font-family:
"Tahoma","sans-serif"; color:
windowtext;"> Sunil Mushran
[<a class="moz-txt-link-freetext" href="mailto:sunil.mushran@oracle.com">mailto:sunil.mushran@oracle.com</a>] <br>
<b>Sent:</b> Friday, August 19, 2011 2:30 PM<br>
<b>To:</b> Nick Geron<br>
<b>Cc:</b> <a class="moz-txt-link-abbreviated" href="mailto:ocfs2-users@oss.oracle.com">ocfs2-users@oss.oracle.com</a><br>
<b>Subject:</b> Re: [Ocfs2-users] IO performance appears
slow<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Somewhat equivalent but it misses the
effect of the workload at that time.<br>
<br>
BTW, those are awful number for 10G NICs. I get better numbers
with gige.<br>
rtt min/avg/max/mdev = 0.149/0.168/0.188/0.020 ms<br>
<br>
You should check the config, etc. Use ethtool, etc.<br>
<br>
On 08/19/2011 10:54 AM, Nick Geron wrote: <o:p></o:p></p>
<p class="MsoNormal"><span style="color: rgb(31, 73, 125);">Thanks
for the feedback Sunil,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="color: rgb(31, 73, 125);"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="color: rgb(31, 73, 125);">You
are correct that the sys and user times were very low. I
did check the response and latency between the two nodes
thinking that could be an issue. I didn’t see an issue
there, but then again I do not know what they should be. Is
there a document that outlines the base and/or
recommendations for that link? The best I can do in this
environment is break my host redundancy and move both nodes
to the same VMware vSwitch with 10g NICs. </span><o:p></o:p></p>
<p class="MsoNormal"><span style="color: rgb(31, 73, 125);"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="color: rgb(31, 73, 125);">Average
latency between the two: rtt min/avg/max/mdev =
0.207/0.268/0.360/0.046 ms. </span><o:p></o:p></p>
<p class="MsoNormal"><span style="color: rgb(31, 73, 125);"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="color: rgb(31, 73, 125);">Are
the ping stats dumped from o2net not equivalent to a simple
ping between the hosts? Is my reported latency too great
for OCFS2 to function well?</span><o:p></o:p></p>
<p class="MsoNormal"><span style="color: rgb(31, 73, 125);"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="color: rgb(31, 73, 125);">Thanks
for your assistance.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="color: rgb(31, 73, 125);"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="color: rgb(31, 73, 125);">-Nick</span><o:p></o:p></p>
<p class="MsoNormal"><span style="color: rgb(31, 73, 125);"> </span><o:p></o:p></p>
<div>
<div style="border-right: medium none; border-width: 1pt
medium medium; border-style: solid none none; padding: 3pt
0in 0in; border-color: -moz-use-text-color;">
<p class="MsoNormal"><b><span style="font-size: 10pt;
font-family:
"Tahoma","sans-serif"; color:
windowtext;">From:</span></b><span style="font-size:
10pt; font-family:
"Tahoma","sans-serif"; color:
windowtext;"> Sunil Mushran [<a moz-do-not-send="true"
href="mailto:sunil.mushran@oracle.com">mailto:sunil.mushran@oracle.com</a>]
<br>
<b>Sent:</b> Thursday, August 18, 2011 10:26 PM<br>
<b>To:</b> Nick Geron<br>
<b>Cc:</b> <a moz-do-not-send="true"
href="mailto:ocfs2-users@oss.oracle.com">ocfs2-users@oss.oracle.com</a><br>
<b>Subject:</b> Re: [Ocfs2-users] IO performance appears
slow</span><o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">The network interconnect between the vms is
slow. What<br>
would have helped is the sys and user times. But my guess<br>
is that that is low. Most of it is spent in wall time.<br>
<br>
In mainline, o2net dumps stats showing the ping time between<br>
nodes. Unfortunately this kernel is too old.<br>
<br>
On 08/18/2011 04:24 PM, Nick Geron wrote: <o:p></o:p></p>
<p class="MsoNormal">Greetings,<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">I’m rather new to OCFS2, so please forgive
any glaringly ignorant statements.<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">I’m evaluating file systems and storage
layout for a simple 2 node mail cluster using Maildir email
directories. I have created a 2 node cluster with related
tutorials. The problem I’m seeing is that general file access
using cp, find, du, ls, etc. is a significant factor slower on
ocfs2 than alternative local and remote disk configurations.
I’m hoping someone can clue me into whether this behavior is
normal, or if I’m missing something in my lab.<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">*Hosts are identical CentOS 5.5 virtual
machines (VMware) with 2.6.18-238.19.1.el5. (2 ESXi hosts)<o:p></o:p></p>
<p class="MsoNormal">*OCFS2 build is
ocfs2-2.6.18-238.19.1.el5-1.4.7-1.el5 (tools v 1.4.4-1.el5).<o:p></o:p></p>
<p class="MsoNormal">*SAN is an EMC Clariion. LUN is accessed
via iSCSI with EMC PowerPath 5.5.0.00.00-275<o:p></o:p></p>
<p class="MsoNormal">*Nodes share a gigabit network for their
private interconnect via two interconnected switches (ESXi
host into each).<o:p></o:p></p>
<p class="MsoNormal">*Test data is a 181MB Maildir directory
(~12K emails) copied to various types of storage.<o:p></o:p></p>
<p class="MsoNormal">*Tests involve simple bash scripts running
(bash) time with the mentioned command line utilities and
strace inspection.<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">The OCFS2 file system was created with the
following (mount cannot load xattr or extended-slotmap added
with max-features):<o:p></o:p></p>
<p class="MsoNormal">mkfs.ocfs2 -N 2 -T mail
--fs-features=backup-super,sparse,unwritten,inline-data -v
/dev/emcpowera<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">Mount options are limited to ‘_netdev’ at
the moment. I’ve read a bit about changing ‘data’ from
ordered to writeback, but that seems to be related to waits on
flushing cache to disk. So far, I’m just focusing on
reads/lstats.<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">With a maildir in place, any operation that
must inspect all files takes quite a while to complete without
cached entries. The alarming thing is the discrepancy between
my OCFS2 data and identical data on local, NFS and iSCSI
mounts.<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">Here’s some simple data that should
illustrate my problem and my confusion:<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">Command: ‘du –hs
/path/to/maildir/on/various/mounts<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">Storage Real time to
complete Min:Sec<o:p></o:p></p>
<p class="MsoNormal">----------------------------------------------------------------------<o:p></o:p></p>
<p class="MsoNormal">Local disk 0:0.078<o:p></o:p></p>
<p class="MsoNormal">NFS 0:2<o:p></o:p></p>
<p class="MsoNormal">iSCSI (EXT3) 0:1.7<o:p></o:p></p>
<p class="MsoNormal">iSCSI (OCFS2) 4:24<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">Other tests including recursive chowns or
chmods, and ls report similar results.<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">Most telling is perhaps strace output.
There I can see system calls on individual Maildir files.
Times between each call/operation take far longer on OCFS2 and
there is no hint of externally derived waits. Nor are there
any indicators of load issues from competing processes;
nothing else (significant) is going on and du has full reign
of the OS resources.<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">Output from strace with –tt –T using du –hs
against the Maildir on my EXT3 iSCSI LUN (/dev/emcpowerb1)<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">18:03:17.572879
lstat("1313705228.000737.mbox:2,S", {st_mode=S_IFREG|0644,
st_size=715, ...}) = 0 <0.000018><o:p></o:p></p>
<p class="MsoNormal">18:03:17.572944
lstat("1313705228.008426.mbox:2,S", {st_mode=S_IFREG|0644,
st_size=2779, ...}) = 0 <0.000024><o:p></o:p></p>
<p class="MsoNormal">18:03:17.573016
lstat("1313705228.006345.mbox:2,S", {st_mode=S_IFREG|0644,
st_size=2703, ...}) = 0 <0.000020><o:p></o:p></p>
<p class="MsoNormal">18:03:17.573083
lstat("1313705228.001305.mbox:2,S", {st_mode=S_IFREG|0644,
st_size=1831, ...}) = 0 <0.000017><o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">Output from the same trace against the
OCFS2 store<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">18:06:52.876713
lstat("1313707554.003441.mbox:2,S", {st_mode=S_IFREG|0644,
st_size=2322, ...}) = 0 <0.040896><o:p></o:p></p>
<p class="MsoNormal">18:06:52.917723
lstat("1313707554.003442.mbox:2,S", {st_mode=S_IFREG|0644,
st_size=2316, ...}) = 0 <0.040663><o:p></o:p></p>
<p class="MsoNormal">18:06:52.958473
lstat("1313707554.003443.mbox:2,S", {st_mode=S_IFREG|0644,
st_size=2899, ...}) = 0 <0.000938><o:p></o:p></p>
<p class="MsoNormal">18:06:52.959471
lstat("1313707554.003444.mbox:2,S", {st_mode=S_IFREG|0644,
st_size=2522, ...}) = 0 <0.001106><o:p></o:p></p>
<p class="MsoNormal">18:06:52.960641
lstat("1313707554.003445.mbox:2,S", {st_mode=S_IFREG|0644,
st_size=3451, ...}) = 0 <0.039904><o:p></o:p></p>
<p class="MsoNormal">18:06:53.000644
lstat("1313707554.003446.mbox:2,S", {st_mode=S_IFREG|0644,
st_size=3150, ...}) = 0 <0.041060><o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">Is this normal behavior for a current
kernel and the most recent 1.4.7 code? Does someone suspect
I’ve blundered somewhere along the way? I’ve seen many posts
to this list related to a mail cluster setup like mine. Is
anyone on the list running a production mail cluster with
OCFS2? I apologize for the length of this email. Thanks.<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">-Nick Geron<o:p></o:p></p>
<pre> <o:p></o:p></pre>
<pre> <o:p></o:p></pre>
<pre>_______________________________________________<o:p></o:p></pre>
<pre>Ocfs2-users mailing list<o:p></o:p></pre>
<pre><a moz-do-not-send="true" href="mailto:Ocfs2-users@oss.oracle.com">Ocfs2-users@oss.oracle.com</a><o:p></o:p></pre>
<pre><a moz-do-not-send="true" href="http://oss.oracle.com/mailman/listinfo/ocfs2-users">http://oss.oracle.com/mailman/listinfo/ocfs2-users</a><o:p></o:p></pre>
<p class="MsoNormal"><span style="font-size: 12pt; font-family:
"Times New Roman","serif";"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="font-size: 12pt; font-family:
"Times New Roman","serif";"><o:p> </o:p></span></p>
</div>
</blockquote>
<br>
</body>
</html>