<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    <tt>Hi Tariq,<br>

      <br>

      Yesterday one node was under load but not as high as past week,

      and iostat showed:<br>

      - 10% of samples with %util &gt;90% (some peaks of 100%) and an

      average value of 18%<br>

      - %iowait peaks of 37% with an average value of 4%<br>

      <br>

      BUT:<br>

      - none of the indicated error messages appeared in

      /var/log/messages<br>

      - we have mounted the OCFS2 filesystem with TWO extra options:<br>

           data=writeback<br>

           commit=20<br>

      * Question about these extra options:<br>

          Perhaps they help to mitigate in some way the problem?<br>

          I've read about using them (usually commit=60) but I don't

      know if they really helps and/or they are even some other useful

      options to use<br>

          Before, the volume as mounted using only the options

      "_netdev,rw,noatime"<br>

      <br>

      NOTE:<br>

      - we have left only one node active (not the three nodes of the

      cluster) to "force" overloads<br>

      - although only one node is serving the app, all the three nodes

      have the OCFS volume mounted<br>

      <br>

      <br>

      About the EACCESS/ENOENT errors...we don't know if they are

      originated by:<br>

      - an abnormal behavior of the application<br>

      - the OCFS2 problem (a user tries to unlink/rename something and

      if system is slow due to OCFS the users retries again and again

      this operation, causing first operation to complete successfully

      but following fail)<br>

      - a possible problem in the concurrency: now with only one node

      servicing the application errors doesn't appear but with the three

      nodes in service errors appeared (several nodes trying to do the

      same operation)<br>

      <br>

      And about the messages about blocked proccess in /var/log/messages

      I'll send directly to you (instead to the list) the file.<br>

      <br>

      Regards.<br>

      <br>

    </tt>

    <div class="moz-signature">

      <hr>

      <img src="cid:part1.02020003.04010802@uva.es">

      <p class="MsoNormal"><b><font face="Franklin Gothic Book"

            color="gray" size="1"><span

              style="font-size:8.0pt;font-family:&quot;Franklin Gothic

              Book&quot;;color:gray; font-weight:bold">

              Area de Sistemas<br>

              Servicio de las Tecnologias de la Informacion y

              Comunicaciones (STIC)<br>

              Universidad de Valladolid<br>

              Edificio Alfonso VIII, C/Real de Burgos s/n. 47011,

              Valladolid - ESPAÑA<br>

              Telefono: 983 18-6410, Fax: 983 423271<br>

              E-mail: <a class="moz-txt-link-abbreviated" href="mailto:sistemas@uva.es">sistemas@uva.es</a><br>

            </span></font></b></p>

      <b><font face="Franklin Gothic Book" color="gray" size="1">

          <hr>

        </font></b></div>

    <div class="moz-cite-prefix">El 14/09/15 a las 20:29, Tariq Saeed

      escribió:<br>

    </div>

    <blockquote cite="mid:55F71218.6060906@oracle.com" type="cite">

      <meta content="text/html; charset=windows-1252"

        http-equiv="Content-Type">

      <br>

      <div class="moz-cite-prefix">On 09/14/2015 01:20 AM, Area de

        Sistemas wrote:<br>

      </div>

      <blockquote cite="mid:55F68341.5030303@uva.es" type="cite">

        <meta http-equiv="content-type" content="text/html;

          charset=windows-1252">

        <tt>Hello everyone,<br>

          <br>

          We have a problem in a 3 member OCFS2 cluster used to serve an

          web/php application that access (read and/or write) files

          located in the OCFS2 volume.<br>

          The problem appears only some times (apparently during high

          load periods).<br>

          <br>

          SYMPTOMS:<br>

          - access to OCFS2 content becomes more an more slow until

          stalls<br>

              * a "ls" command that normally takes &lt;=1s takes 30s,

          40s, 1m,...<br>

          - load average of the system grows to 150, 200 or even more<br>

          <br>

          - high iowait values: 70-90%<br>

            <br>

        </tt></blockquote>

      <tt>         This is hint that disk is under pressure. Run iostat

        (see man page)<br>

                 when this happens, producing report every 3 seconds or

        and look at<br>

                 %util col<br>

                               %util<br>

                             Percentage of CPU time during which I/O

        requests were issued to the  device  (bandwidth<br>

                             utilization for the device). Device

        saturation occurs when this value is close to 100%.<br>

        <br>

      </tt>

      <blockquote cite="mid:55F68341.5030303@uva.es" type="cite"><tt>  

          * but CPU usage is low<br>

          <br>

          - in the syslog appears a lot of messages like:<br>

              (httpd,XXXXX,Y):ocfs2_rename:1474 ERROR: status = -13<br>

        </tt></blockquote>

      <tt>    </tt>EACCES    Permission denied. find the filename and

      check perms ls -l.<br>

      <blockquote cite="mid:55F68341.5030303@uva.es" type="cite"><tt>  

          or<br>

              (httpd,XXXXX,Y):ocfs2_unlink:951 ERROR: status = -2<br>

        </tt></blockquote>

      <tt>    </tt>ENOENT     All we can say is an attempt to delete a

      file from a directory that has already been deleted. <br>

                              This requires some knowledge of the

      environment. Is there an application log. <br>

      <blockquote cite="mid:55F68341.5030303@uva.es" type="cite"><tt> <br>

            and the more "worrying":<br>

               kernel: INFO: task httpd:3488 blocked for more than 120

          seconds.<br>

               kernel: "echo 0 &gt;

          /proc/sys/kernel/hung_task_timeout_secs" disables this

          message.<br>

               kernel: httpd           D c6fe5d74     0  3488   1616

          0x00000080    <br>

               kernel: c6fe5e04 00000082 00000000 c6fe5d74 c6fe5d74

          000041fd c6fe5d88 c0439b18<br>

               kernel: c0b976c0 c0b976c0 c0b976c0 c0b976c0 ed0f0ac0

          c6fe5de8 c0b976c0 f75ac6c0<br>

               kernel: f2f0cd60 c0a95060 00000001 c6fe5dbc c0874b8d

          c6fe5de8 f8fd9a86 00000001<br>

               kernel: Call Trace:<br>

               kernel: [&lt;c0439b18&gt;] ?

          default_spin_lock_flags+0x8/0x10<br>

               kernel: [&lt;c0874b8d&gt;] ? _raw_spin_lock+0xd/0x10<br>

               kernel: [&lt;f8fd9a86&gt;] ?

          ocfs2_dentry_revalidate+0xc6/0x2d0 [ocfs2]<br>

               kernel: [&lt;f8ff17be&gt;] ? ocfs2_permission+0xfe/0x110

          [ocfs2]<br>

               kernel: [&lt;f905b6f0&gt;] ? ocfs2_acl_chmod+0xd0/0xd0

          [ocfs2]<br>

               kernel: [&lt;c0873105&gt;] schedule+0x35/0x50<br>

               kernel: [&lt;c0873b2e&gt;]

          __mutex_lock_slowpath+0xbe/0x120<br>

               ....<br>

          <br>

        </tt></blockquote>

      <tt>the important part of bt is cut off. Where is the rest of it?

        The entries starting with "?"<br>

        are junk. You can attach /v/l/messages to give us a complete

        pic.My guess is blocking on <br>

        mutex for so long is that the thread holding mutex is blocked on

        i/o. <br>

        Run "ps -e -o pid,stat,comm,whchan=WIDE_WCHAN-COLUMN" and look

        at 'D' state (uninterruptable slee)<br>

        process. These are processes usually blocked on i/o. <br>

      </tt>

      <blockquote cite="mid:55F68341.5030303@uva.es" type="cite"><tt> <br>

          (UNACCEPTABLE) WORKAROUND:<br>

             stop httpd (really slow)<br>

             stop ocfs2 service (really slow)<br>

             start ocfs2 an httpd<br>

          <br>

          MORE INFO:<br>

          - OS information:<br>

              Oracle Linux 6.4 32bit<br>

              4GB RAM<br>

              uname -a: 2.6.39-400.109.6.el6uek.i686 #1 SMP Wed Aug 28

          09:55:10 PDT 2013 i686 i686 i386 GNU/Linux<br>

              * anyway: we have another 5 nodes cluster with Oracle

          Linux 7.1 (so 64bit OS) serving a newer version of the same

          application and the problems are similar, so it appears not to

          be a OS problem but a more specific OCFS2 problem (bug? some

          tuning? other?)<br>

          <br>

          - standard configuration<br>

              * if you want I can show the cluster.conf configuration

          but is the "expected configuration"<br>

          <br>

          - standard configuration in o2cb:<br>

              Driver for "configfs": Loaded<br>

              Filesystem "configfs": Mounted<br>

              Stack glue driver: Loaded<br>

              Stack plugin "o2cb": Loaded<br>

              Driver for "ocfs2_dlmfs": Loaded<br>

              Filesystem "ocfs2_dlmfs": Mounted<br>

              Checking O2CB cluster "MoodleOCFS2": Online<br>

                Heartbeat dead threshold: 31<br>

                Network idle timeout: 30000<br>

                Network keepalive delay: 2000<br>

                Network reconnect delay: 2000<br>

                Heartbeat mode: Local<br>

              Checking O2CB heartbeat: Active<br>

          <br>

          - mount options: _netdev,rw,noatime<br>

              * so other options (commit, data, ...) have their default

          values<br>

          <br>

          <br>

          Any ideas/suggestion?<br>

          <br>

          Regards.<br>

          <br>

        </tt>

        <div class="moz-signature">-- <br>

          <hr> <img src="cid:part2.08020900.02040306@uva.es">

          <p class="MsoNormal"><b><font face="Franklin Gothic Book"

                color="gray" size="1"><span

                  style="font-size:8.0pt;font-family:&quot;Franklin

                  Gothic Book&quot;;color:gray; font-weight:bold"> Area

                  de Sistemas<br>

                  Servicio de las Tecnologias de la Informacion y

                  Comunicaciones (STIC)<br>

                  Universidad de Valladolid<br>

                  Edificio Alfonso VIII, C/Real de Burgos s/n. 47011,

                  Valladolid - ESPAÑA<br>

                  Telefono: 983 18-6410, Fax: 983 423271<br>

                  E-mail: <a moz-do-not-send="true"

                    class="moz-txt-link-abbreviated"

                    href="mailto:sistemas@uva.es">sistemas@uva.es</a><br>

                </span></font></b></p>

          <b><font face="Franklin Gothic Book" color="gray" size="1">

              <hr> </font></b></div>

        <br>

        <fieldset class="mimeAttachmentHeader"></fieldset>

        <br>

        <pre wrap="">_______________________________________________

Ocfs2-users mailing list

<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Ocfs2-users@oss.oracle.com">Ocfs2-users@oss.oracle.com</a>

<a moz-do-not-send="true" class="moz-txt-link-freetext" href="https://oss.oracle.com/mailman/listinfo/ocfs2-users">https://oss.oracle.com/mailman/listinfo/ocfs2-users</a></pre>

      </blockquote>

      <br>

    </blockquote>

    <br>

  </body>

</html>