<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#ffffff" text="#000000">

    The cluster stack uses the interconnect to negotiate the locks.<br>

    That's how it is able to provide data coherency. Other solutions<br>

    do not provide that kind of coherency.<br>

    <br>

    If you are referring to interconnect speeds in ms, it is not good.<br>

    That unit is typically used for disk access.<br>

    <br>

    On 08/19/2011 01:30 PM, Nick Geron wrote:

    <blockquote

cite="mid:8F348D9D79ADA24993FCE45FE44F162402712018903B@exchange01.corp.corenap.com"

      type="cite">

      <meta http-equiv="Content-Type" content="text/html;

        charset=ISO-8859-1">

      <meta name="Generator" content="Microsoft Word 12 (filtered

        medium)">

      <style><!--

/* Font Definitions */

@font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}

@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}

@font-face

        {font-family:Tahoma;

        panose-1:2 11 6 4 3 5 4 4 2 4;}

@font-face

        {font-family:Consolas;

        panose-1:2 11 6 9 2 2 4 3 2 4;}

/* Style Definitions */

p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0in;

        margin-bottom:.0001pt;

        font-size:11.0pt;

        font-family:"Calibri","sans-serif";

        color:black;}

a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:blue;

        text-decoration:underline;}

a:visited, span.MsoHyperlinkFollowed

        {mso-style-priority:99;

        color:purple;

        text-decoration:underline;}

pre

        {mso-style-priority:99;

        mso-style-link:"HTML Preformatted Char";

        margin:0in;

        margin-bottom:.0001pt;

        font-size:10.0pt;

        font-family:"Courier New";

        color:black;}

span.HTMLPreformattedChar

        {mso-style-name:"HTML Preformatted Char";

        mso-style-priority:99;

        mso-style-link:"HTML Preformatted";

        font-family:Consolas;

        color:black;}

span.EmailStyle19

        {mso-style-type:personal;

        font-family:"Calibri","sans-serif";

        color:windowtext;}

span.EmailStyle20

        {mso-style-type:personal;

        font-family:"Calibri","sans-serif";

        color:#1F497D;}

span.EmailStyle21

        {mso-style-type:personal-reply;

        font-family:"Calibri","sans-serif";

        color:#1F497D;}

.MsoChpDefault

        {mso-style-type:export-only;

        font-size:10.0pt;}

@page WordSection1

        {size:8.5in 11.0in;

        margin:1.0in 1.0in 1.0in 1.0in;}

div.WordSection1

        {page:WordSection1;}

--></style><!--[if gte mso 9]><xml>

<o:shapedefaults v:ext="edit" spidmax="1026" />

</xml><![endif]--><!--[if gte mso 9]><xml>

<o:shapelayout v:ext="edit">

<o:idmap v:ext="edit" data="1" />

</o:shapelayout></xml><![endif]-->

      <div class="WordSection1">

        <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">Actually

            those first numbers were from GigE links going out to

            physical switches and back in.&nbsp; To optimize the private

            link, I upgraded the VMs NICs to 10GE (VMXNet3 which is the

            VMware para virt driver), moved them onto the same host

            system with a dedicated software switch between them.&nbsp; The

            numbers only improved slightly, and got worse on 1 of 100

            pings (1ms). <o:p></o:p></span></p>

        <p class="MsoNormal"><span style="color: rgb(31, 73, 125);"><o:p>&nbsp;</o:p></span></p>

        <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">10GE

            between VMs under the same hypervisor: rtt min/avg/max/mdev

            = 0.194/0.307/1.003/0.132 ms<o:p></o:p></span></p>

        <p class="MsoNormal"><span style="color: rgb(31, 73, 125);"><o:p>&nbsp;</o:p></span></p>

        <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">What

            I don&#8217;t understand is why my OCFS2 cluster suffers so

            greatly.&nbsp; There&#8217;s quite a big difference between wall time

            of 0.17 seconds to traverse the data on an iSCSI link and

            the 4 minutes to do the same on OCFS2 with a sub 1ms average

            latent private interconnect. &nbsp;For that matter, the whole

            setup is running on another clustered FS (VMFS3) over the

            same network to the same SAN.&nbsp; I guess I&#8217;m just a little

            dumbfounded that OCFS2 is so much more demanding than other

            clustered FSs and alternative network storage options.<o:p></o:p></span></p>

        <p class="MsoNormal"><span style="color: rgb(31, 73, 125);"><o:p>&nbsp;</o:p></span></p>

        <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">Is

            the network really the most likely candidate?&nbsp; If so, is

            anyone else running OCFS2 from within a VM environment?&nbsp; Is

            this technology only worthwhile in the physical world?&nbsp; Is

            there a sweet spot for network latency that I should strive

            for?&nbsp; The user guide only makes mention of &#8216;low latency&#8217; but

            lacks figures save for heartbeat and timeouts.<o:p></o:p></span></p>

        <p class="MsoNormal"><span style="color: rgb(31, 73, 125);"><o:p>&nbsp;</o:p></span></p>

        <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">-nick<o:p></o:p></span></p>

        <p class="MsoNormal"><span style="color: rgb(31, 73, 125);"><o:p>&nbsp;</o:p></span></p>

        <div>

          <div style="border-right: medium none; border-width: 1pt

            medium medium; border-style: solid none none; border-color:

            rgb(181, 196, 223) -moz-use-text-color -moz-use-text-color;

            padding: 3pt 0in 0in;">

            <p class="MsoNormal"><b><span style="font-size: 10pt;

                  font-family:

                  &quot;Tahoma&quot;,&quot;sans-serif&quot;; color:

                  windowtext;">From:</span></b><span style="font-size:

                10pt; font-family:

                &quot;Tahoma&quot;,&quot;sans-serif&quot;; color:

                windowtext;"> Sunil Mushran

                [<a class="moz-txt-link-freetext" href="mailto:sunil.mushran@oracle.com">mailto:sunil.mushran@oracle.com</a>] <br>

                <b>Sent:</b> Friday, August 19, 2011 2:30 PM<br>

                <b>To:</b> Nick Geron<br>

                <b>Cc:</b> <a class="moz-txt-link-abbreviated" href="mailto:ocfs2-users@oss.oracle.com">ocfs2-users@oss.oracle.com</a><br>

                <b>Subject:</b> Re: [Ocfs2-users] IO performance appears

                slow<o:p></o:p></span></p>

          </div>

        </div>

        <p class="MsoNormal"><o:p>&nbsp;</o:p></p>

        <p class="MsoNormal">Somewhat equivalent but it misses the

          effect of the workload at that time.<br>

          <br>

          BTW, those are awful number for 10G NICs. I get better numbers

          with gige.<br>

          rtt min/avg/max/mdev = 0.149/0.168/0.188/0.020 ms<br>

          <br>

          You should check the config, etc. Use ethtool, etc.<br>

          <br>

          On 08/19/2011 10:54 AM, Nick Geron wrote: <o:p></o:p></p>

        <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">Thanks

            for the feedback Sunil,</span><o:p></o:p></p>

        <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">&nbsp;</span><o:p></o:p></p>

        <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">You

            are correct that the sys and user times were very low.&nbsp; I

            did check the response and latency between the two nodes

            thinking that could be an issue. &nbsp;&nbsp;I didn&#8217;t see an issue

            there, but then again I do not know what they should be.&nbsp; Is

            there a document that outlines the base and/or

            recommendations for that link?&nbsp; The best I can do in this

            environment is break my host redundancy and move both nodes

            to the same VMware vSwitch with 10g NICs.&nbsp; </span><o:p></o:p></p>

        <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">&nbsp;</span><o:p></o:p></p>

        <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">Average

            latency between the two: rtt min/avg/max/mdev =

            0.207/0.268/0.360/0.046 ms. </span><o:p></o:p></p>

        <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">&nbsp;</span><o:p></o:p></p>

        <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">Are

            the ping stats dumped from o2net not equivalent to a simple

            ping between the hosts?&nbsp; Is my reported latency too great

            for OCFS2 to function well?</span><o:p></o:p></p>

        <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">&nbsp;</span><o:p></o:p></p>

        <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">Thanks

            for your assistance.</span><o:p></o:p></p>

        <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">&nbsp;</span><o:p></o:p></p>

        <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">-Nick</span><o:p></o:p></p>

        <p class="MsoNormal"><span style="color: rgb(31, 73, 125);">&nbsp;</span><o:p></o:p></p>

        <div>

          <div style="border-right: medium none; border-width: 1pt

            medium medium; border-style: solid none none; padding: 3pt

            0in 0in; border-color: -moz-use-text-color;">

            <p class="MsoNormal"><b><span style="font-size: 10pt;

                  font-family:

                  &quot;Tahoma&quot;,&quot;sans-serif&quot;; color:

                  windowtext;">From:</span></b><span style="font-size:

                10pt; font-family:

                &quot;Tahoma&quot;,&quot;sans-serif&quot;; color:

                windowtext;"> Sunil Mushran [<a moz-do-not-send="true"

                  href="mailto:sunil.mushran@oracle.com">mailto:sunil.mushran@oracle.com</a>]

                <br>

                <b>Sent:</b> Thursday, August 18, 2011 10:26 PM<br>

                <b>To:</b> Nick Geron<br>

                <b>Cc:</b> <a moz-do-not-send="true"

                  href="mailto:ocfs2-users@oss.oracle.com">ocfs2-users@oss.oracle.com</a><br>

                <b>Subject:</b> Re: [Ocfs2-users] IO performance appears

                slow</span><o:p></o:p></p>

          </div>

        </div>

        <p class="MsoNormal">&nbsp;<o:p></o:p></p>

        <p class="MsoNormal">The network interconnect between the vms is

          slow. What<br>

          would have helped is the sys and user times. But my guess<br>

          is that that is low. Most of it is spent in wall time.<br>

          <br>

          In mainline, o2net dumps stats showing the ping time between<br>

          nodes. Unfortunately this kernel is too old.<br>

          &nbsp;<br>

          On 08/18/2011 04:24 PM, Nick Geron wrote: <o:p></o:p></p>

        <p class="MsoNormal">Greetings,<o:p></o:p></p>

        <p class="MsoNormal">&nbsp;<o:p></o:p></p>

        <p class="MsoNormal">I&#8217;m rather new to OCFS2, so please forgive

          any glaringly ignorant statements.<o:p></o:p></p>

        <p class="MsoNormal">&nbsp;<o:p></o:p></p>

        <p class="MsoNormal">I&#8217;m evaluating file systems and storage

          layout for a simple 2 node mail cluster using Maildir email

          directories.&nbsp; I have created a 2 node cluster with related

          tutorials.&nbsp; The problem I&#8217;m seeing is that general file access

          using cp, find, du, ls, etc. is a significant factor slower on

          ocfs2 than alternative local and remote disk configurations.&nbsp;

          I&#8217;m hoping someone can clue me into whether this behavior is

          normal, or if I&#8217;m missing something in my lab.<o:p></o:p></p>

        <p class="MsoNormal">&nbsp;<o:p></o:p></p>

        <p class="MsoNormal">*Hosts are identical CentOS 5.5 virtual

          machines (VMware) with 2.6.18-238.19.1.el5. (2 ESXi hosts)<o:p></o:p></p>

        <p class="MsoNormal">*OCFS2 build is

          ocfs2-2.6.18-238.19.1.el5-1.4.7-1.el5 (tools v 1.4.4-1.el5).<o:p></o:p></p>

        <p class="MsoNormal">*SAN is an EMC Clariion.&nbsp; LUN is accessed

          via iSCSI with EMC PowerPath 5.5.0.00.00-275<o:p></o:p></p>

        <p class="MsoNormal">*Nodes share a gigabit network for their

          private interconnect via two interconnected switches (ESXi

          host into each).<o:p></o:p></p>

        <p class="MsoNormal">*Test data is a 181MB Maildir directory

          (~12K emails) copied to various types of storage.<o:p></o:p></p>

        <p class="MsoNormal">*Tests involve simple bash scripts running

          (bash) time with the mentioned command line utilities and

          strace inspection.<o:p></o:p></p>

        <p class="MsoNormal">&nbsp;<o:p></o:p></p>

        <p class="MsoNormal">The OCFS2 file system was created with the

          following (mount cannot load xattr or extended-slotmap added

          with max-features):<o:p></o:p></p>

        <p class="MsoNormal">mkfs.ocfs2 -N 2 -T mail

          --fs-features=backup-super,sparse,unwritten,inline-data -v

          /dev/emcpowera<o:p></o:p></p>

        <p class="MsoNormal">&nbsp;<o:p></o:p></p>

        <p class="MsoNormal">Mount options are limited to &#8216;_netdev&#8217; at

          the moment.&nbsp; I&#8217;ve read a bit about changing &#8216;data&#8217; from

          ordered to writeback, but that seems to be related to waits on

          flushing cache to disk.&nbsp; So far, I&#8217;m just focusing on

          reads/lstats.<o:p></o:p></p>

        <p class="MsoNormal">&nbsp;<o:p></o:p></p>

        <p class="MsoNormal">With a maildir in place, any operation that

          must inspect all files takes quite a while to complete without

          cached entries.&nbsp; The alarming thing is the discrepancy between

          my OCFS2 data and identical data on local, NFS and iSCSI

          mounts.<o:p></o:p></p>

        <p class="MsoNormal">&nbsp;<o:p></o:p></p>

        <p class="MsoNormal">Here&#8217;s some simple data that should

          illustrate my problem and my confusion:<o:p></o:p></p>

        <p class="MsoNormal">&nbsp;<o:p></o:p></p>

        <p class="MsoNormal">Command: &#8216;du &#8211;hs

          /path/to/maildir/on/various/mounts<o:p></o:p></p>

        <p class="MsoNormal">&nbsp;<o:p></o:p></p>

        <p class="MsoNormal">Storage&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Real time to

          complete Min:Sec<o:p></o:p></p>

        <p class="MsoNormal">----------------------------------------------------------------------<o:p></o:p></p>

        <p class="MsoNormal">Local disk&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0:0.078<o:p></o:p></p>

        <p class="MsoNormal">NFS&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0:2<o:p></o:p></p>

        <p class="MsoNormal">iSCSI (EXT3)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 0:1.7<o:p></o:p></p>

        <p class="MsoNormal">iSCSI (OCFS2)&nbsp;&nbsp;&nbsp;&nbsp; 4:24<o:p></o:p></p>

        <p class="MsoNormal">&nbsp;<o:p></o:p></p>

        <p class="MsoNormal">Other tests including recursive chowns or

          chmods, and ls report similar results.<o:p></o:p></p>

        <p class="MsoNormal">&nbsp;<o:p></o:p></p>

        <p class="MsoNormal">Most telling is perhaps strace output.&nbsp;

          There I can see system calls on individual Maildir files.&nbsp;

          Times between each call/operation take far longer on OCFS2 and

          there is no hint of externally derived waits.&nbsp; Nor are there

          any indicators of load issues from competing processes;

          nothing else (significant) is going on and du has full reign

          of the OS resources.<o:p></o:p></p>

        <p class="MsoNormal">&nbsp;<o:p></o:p></p>

        <p class="MsoNormal">Output from strace with &#8211;tt &#8211;T using du &#8211;hs

          against the Maildir on my EXT3 iSCSI LUN (/dev/emcpowerb1)<o:p></o:p></p>

        <p class="MsoNormal">&nbsp;<o:p></o:p></p>

        <p class="MsoNormal">18:03:17.572879

          lstat("1313705228.000737.mbox:2,S", {st_mode=S_IFREG|0644,

          st_size=715, ...}) = 0 &lt;0.000018&gt;<o:p></o:p></p>

        <p class="MsoNormal">18:03:17.572944

          lstat("1313705228.008426.mbox:2,S", {st_mode=S_IFREG|0644,

          st_size=2779, ...}) = 0 &lt;0.000024&gt;<o:p></o:p></p>

        <p class="MsoNormal">18:03:17.573016

          lstat("1313705228.006345.mbox:2,S", {st_mode=S_IFREG|0644,

          st_size=2703, ...}) = 0 &lt;0.000020&gt;<o:p></o:p></p>

        <p class="MsoNormal">18:03:17.573083

          lstat("1313705228.001305.mbox:2,S", {st_mode=S_IFREG|0644,

          st_size=1831, ...}) = 0 &lt;0.000017&gt;<o:p></o:p></p>

        <p class="MsoNormal">&nbsp;<o:p></o:p></p>

        <p class="MsoNormal">Output from the same trace against the

          OCFS2 store<o:p></o:p></p>

        <p class="MsoNormal">&nbsp;<o:p></o:p></p>

        <p class="MsoNormal">18:06:52.876713

          lstat("1313707554.003441.mbox:2,S", {st_mode=S_IFREG|0644,

          st_size=2322, ...}) = 0 &lt;0.040896&gt;<o:p></o:p></p>

        <p class="MsoNormal">18:06:52.917723

          lstat("1313707554.003442.mbox:2,S", {st_mode=S_IFREG|0644,

          st_size=2316, ...}) = 0 &lt;0.040663&gt;<o:p></o:p></p>

        <p class="MsoNormal">18:06:52.958473

          lstat("1313707554.003443.mbox:2,S", {st_mode=S_IFREG|0644,

          st_size=2899, ...}) = 0 &lt;0.000938&gt;<o:p></o:p></p>

        <p class="MsoNormal">18:06:52.959471

          lstat("1313707554.003444.mbox:2,S", {st_mode=S_IFREG|0644,

          st_size=2522, ...}) = 0 &lt;0.001106&gt;<o:p></o:p></p>

        <p class="MsoNormal">18:06:52.960641

          lstat("1313707554.003445.mbox:2,S", {st_mode=S_IFREG|0644,

          st_size=3451, ...}) = 0 &lt;0.039904&gt;<o:p></o:p></p>

        <p class="MsoNormal">18:06:53.000644

          lstat("1313707554.003446.mbox:2,S", {st_mode=S_IFREG|0644,

          st_size=3150, ...}) = 0 &lt;0.041060&gt;<o:p></o:p></p>

        <p class="MsoNormal">&nbsp;<o:p></o:p></p>

        <p class="MsoNormal">Is this normal behavior for a current

          kernel and the most recent 1.4.7 code?&nbsp; Does someone suspect

          I&#8217;ve blundered somewhere along the way? &nbsp;I&#8217;ve seen many posts

          to this list related to a mail cluster setup like mine.&nbsp; Is

          anyone on the list running a production mail cluster with

          OCFS2?&nbsp; I apologize for the length of this email.&nbsp; Thanks.<o:p></o:p></p>

        <p class="MsoNormal">&nbsp;<o:p></o:p></p>

        <p class="MsoNormal">-Nick Geron<o:p></o:p></p>

        <pre>&nbsp;<o:p></o:p></pre>

        <pre>&nbsp;<o:p></o:p></pre>

        <pre>_______________________________________________<o:p></o:p></pre>

        <pre>Ocfs2-users mailing list<o:p></o:p></pre>

        <pre><a moz-do-not-send="true" href="mailto:Ocfs2-users@oss.oracle.com">Ocfs2-users@oss.oracle.com</a><o:p></o:p></pre>

        <pre><a moz-do-not-send="true" href="http://oss.oracle.com/mailman/listinfo/ocfs2-users">http://oss.oracle.com/mailman/listinfo/ocfs2-users</a><o:p></o:p></pre>

        <p class="MsoNormal"><span style="font-size: 12pt; font-family:

            &quot;Times New Roman&quot;,&quot;serif&quot;;">&nbsp;</span><o:p></o:p></p>

        <p class="MsoNormal"><span style="font-size: 12pt; font-family:

            &quot;Times New Roman&quot;,&quot;serif&quot;;"><o:p>&nbsp;</o:p></span></p>

      </div>

    </blockquote>

    <br>

  </body>

</html>