<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#000000">

    Hi Area,<br>

    data=writeback improves things greatly. In ordered mode , the

    default, before writing<br>

    a transaction(which only logs meta data changes) data is written.

    This is very conservative<br>

    to ensure that before journal log buffer is written to disk journal

    area, data has hit the disk and<br>

    transaction can be safely replayed in case of a crash -- only

    complete transactions are replayed,<br>

    by complete I mean: begin-trans changebuf1, changebuf2, ... ,

    changebufnn end-trans. Replay means buffer<br>

    are dispatched from the journal area on disk to their ultimate home

    loc on disk. You can see now why<br>

    ordered mode generates so much i/o. In write back mode, transaction

    can hit the disk but data<br>

    will be written whenever the kernel wants, asynchronously and

    without knowing any relationship<br>

    to its related data. The danger is in case of a crash, we can replay

    a transaction but its associated<br>

    data is not on disk. For example, if you truncate up a file to a new

    bigger size and then<br>

    write something to a page beyond the old size, the page could hang

    around in core for a long time<br>

    after transaction is written to the journal area on disk. If there

    is a crash while the data page is still<br>

    in core, after replay, the file will have new size but the page with

    data will show all zeros instead of<br>

    what you wrote. At any rate, this is a digression, just for your

    info. <br>

    <br>

    The commit is the interval at which data is synced to disc. I think

    it may also be the interval<br>

    after which journal log buffer is written to disk. So decreasing it

    reduces number of unecessary <br>

    writes.<br>

    <br>

    Now for the threads blocked for more than 120 sec in

    /var/log/messages. There are two types. <br>

    First type is blocked on mutex on ocfs2 system file, mostly the

    global bit map file shared by<br>

    all nodes. All writes to system files are done under transactions

    and that may require<br>

    flushing to disk the journal buffer, depending upon your journal

    file size. The smaller the size,<br>

    the fewer transactions it can hold, so more frequently the journal

    log on disk needs to be<br>

    reclaimed by dispatching the meta data blocks from the journal space

    to their home locations,<br>

    thus freeing up on-disk journal space. This requires reading meta

    data blocks from journal area<br>

    on disk, and writing them to their home location. So again, lot of

    i/o. I think the threads are <br>

    waiting on mutex because journal code must do this reclaiming to

    free up space. The other kind<br>

     of blocked threads are NOT in ocfs2 code but they<br>

    all are blocked on mutex. I don't know why. That would require

    getting a vmcore and chasing<br>

    the mutex owner and finding out why is it taking long time. I don't

    think that is warranted at<br>

    this time. <br>

    Let me know if you have any further questions. <br>

    Thanks<br>

    -Tariq <br>

    <div class="moz-cite-prefix">On 09/15/2015 01:55 AM, Area de

      Sistemas wrote:<br>

    </div>

    <blockquote cite="mid:55F7DCE5.7090601@uva.es" type="cite">

      <meta content="text/html; charset=windows-1252"

        http-equiv="Content-Type">

      <tt>Hi Tariq,<br>

        <br>

        Yesterday one node was under load but not as high as past week,

        and iostat showed:<br>

        - 10% of samples with %util &gt;90% (some peaks of 100%) and an

        average value of 18%<br>

        - %iowait peaks of 37% with an average value of 4%<br>

        <br>

        BUT:<br>

        - none of the indicated error messages appeared in

        /var/log/messages<br>

        - we have mounted the OCFS2 filesystem with TWO extra options:<br>

             data=writeback<br>

             commit=20<br>

        * Question about these extra options:<br>

            Perhaps they help to mitigate in some way the problem?<br>

            I've read about using them (usually commit=60) but I don't

        know if they really helps and/or they are even some other useful

        options to use<br>

            Before, the volume as mounted using only the options

        "_netdev,rw,noatime"<br>

        <br>

        NOTE:<br>

        - we have left only one node active (not the three nodes of the

        cluster) to "force" overloads<br>

        - although only one node is serving the app, all the three nodes

        have the OCFS volume mounted<br>

        <br>

        <br>

        About the EACCESS/ENOENT errors...we don't know if they are

        originated by:<br>

        - an abnormal behavior of the application<br>

        - the OCFS2 problem (a user tries to unlink/rename something and

        if system is slow due to OCFS the users retries again and again

        this operation, causing first operation to complete successfully

        but following fail)<br>

        - a possible problem in the concurrency: now with only one node

        servicing the application errors doesn't appear but with the

        three nodes in service errors appeared (several nodes trying to

        do the same operation)<br>

        <br>

        And about the messages about blocked proccess in

        /var/log/messages I'll send directly to you (instead to the

        list) the file.<br>

        <br>

        Regards.<br>

        <br>

      </tt>

      <div class="moz-signature">

        <hr> <img src="cid:part1.09090906.06010400@oracle.com">

        <p class="MsoNormal"><b><font color="gray" face="Franklin Gothic

              Book" size="1"><span

                style="font-size:8.0pt;font-family:&quot;Franklin Gothic

                Book&quot;;color:gray; font-weight:bold"> Area de

                Sistemas<br>

                Servicio de las Tecnologias de la Informacion y

                Comunicaciones (STIC)<br>

                Universidad de Valladolid<br>

                Edificio Alfonso VIII, C/Real de Burgos s/n. 47011,

                Valladolid - ESPAÑA<br>

                Telefono: 983 18-6410, Fax: 983 423271<br>

                E-mail: <a moz-do-not-send="true"

                  class="moz-txt-link-abbreviated"

                  href="mailto:sistemas@uva.es">sistemas@uva.es</a><br>

              </span></font></b></p>

        <b><font color="gray" face="Franklin Gothic Book" size="1">

            <hr> </font></b></div>

      <div class="moz-cite-prefix">El 14/09/15 a las 20:29, Tariq Saeed

        escribió:<br>

      </div>

      <blockquote cite="mid:55F71218.6060906@oracle.com" type="cite">

        <meta content="text/html; charset=windows-1252"

          http-equiv="Content-Type">

        <br>

        <div class="moz-cite-prefix">On 09/14/2015 01:20 AM, Area de

          Sistemas wrote:<br>

        </div>

        <blockquote cite="mid:55F68341.5030303@uva.es" type="cite">

          <meta http-equiv="content-type" content="text/html;

            charset=windows-1252">

          <tt>Hello everyone,<br>

            <br>

            We have a problem in a 3 member OCFS2 cluster used to serve

            an web/php application that access (read and/or write) files

            located in the OCFS2 volume.<br>

            The problem appears only some times (apparently during high

            load periods).<br>

            <br>

            SYMPTOMS:<br>

            - access to OCFS2 content becomes more an more slow until

            stalls<br>

                * a "ls" command that normally takes &lt;=1s takes 30s,

            40s, 1m,...<br>

            - load average of the system grows to 150, 200 or even more<br>

            <br>

            - high iowait values: 70-90%<br>

              <br>

          </tt></blockquote>

        <tt>         This is hint that disk is under pressure. Run

          iostat (see man page)<br>

                   when this happens, producing report every 3 seconds

          or and look at<br>

                   %util col<br>

                                 %util<br>

                               Percentage of CPU time during which I/O

          requests were issued to the  device  (bandwidth<br>

                               utilization for the device). Device

          saturation occurs when this value is close to 100%.<br>

          <br>

        </tt>

        <blockquote cite="mid:55F68341.5030303@uva.es" type="cite"><tt>  

            * but CPU usage is low<br>

            <br>

            - in the syslog appears a lot of messages like:<br>

                (httpd,XXXXX,Y):ocfs2_rename:1474 ERROR: status = -13<br>

          </tt></blockquote>

        <tt>    </tt>EACCES    Permission denied. find the filename and

        check perms ls -l.<br>

        <blockquote cite="mid:55F68341.5030303@uva.es" type="cite"><tt>

              or<br>

                (httpd,XXXXX,Y):ocfs2_unlink:951 ERROR: status = -2<br>

          </tt></blockquote>

        <tt>    </tt>ENOENT     All we can say is an attempt to delete

        a file from a directory that has already been deleted. <br>

                                This requires some knowledge of the

        environment. Is there an application log. <br>

        <blockquote cite="mid:55F68341.5030303@uva.es" type="cite"><tt>

            <br>

              and the more "worrying":<br>

                 kernel: INFO: task httpd:3488 blocked for more than 120

            seconds.<br>

                 kernel: "echo 0 &gt;

            /proc/sys/kernel/hung_task_timeout_secs" disables this

            message.<br>

                 kernel: httpd           D c6fe5d74     0  3488   1616

            0x00000080    <br>

                 kernel: c6fe5e04 00000082 00000000 c6fe5d74 c6fe5d74

            000041fd c6fe5d88 c0439b18<br>

                 kernel: c0b976c0 c0b976c0 c0b976c0 c0b976c0 ed0f0ac0

            c6fe5de8 c0b976c0 f75ac6c0<br>

                 kernel: f2f0cd60 c0a95060 00000001 c6fe5dbc c0874b8d

            c6fe5de8 f8fd9a86 00000001<br>

                 kernel: Call Trace:<br>

                 kernel: [&lt;c0439b18&gt;] ?

            default_spin_lock_flags+0x8/0x10<br>

                 kernel: [&lt;c0874b8d&gt;] ? _raw_spin_lock+0xd/0x10<br>

                 kernel: [&lt;f8fd9a86&gt;] ?

            ocfs2_dentry_revalidate+0xc6/0x2d0 [ocfs2]<br>

                 kernel: [&lt;f8ff17be&gt;] ?

            ocfs2_permission+0xfe/0x110 [ocfs2]<br>

                 kernel: [&lt;f905b6f0&gt;] ? ocfs2_acl_chmod+0xd0/0xd0

            [ocfs2]<br>

                 kernel: [&lt;c0873105&gt;] schedule+0x35/0x50<br>

                 kernel: [&lt;c0873b2e&gt;]

            __mutex_lock_slowpath+0xbe/0x120<br>

                 ....<br>

            <br>

          </tt></blockquote>

        <tt>the important part of bt is cut off. Where is the rest of

          it? The entries starting with "?"<br>

          are junk. You can attach /v/l/messages to give us a complete

          pic.My guess is blocking on <br>

          mutex for so long is that the thread holding mutex is blocked

          on i/o. <br>

          Run "ps -e -o pid,stat,comm,whchan=WIDE_WCHAN-COLUMN" and look

          at 'D' state (uninterruptable slee)<br>

          process. These are processes usually blocked on i/o. <br>

        </tt>

        <blockquote cite="mid:55F68341.5030303@uva.es" type="cite"><tt>

            <br>

            (UNACCEPTABLE) WORKAROUND:<br>

               stop httpd (really slow)<br>

               stop ocfs2 service (really slow)<br>

               start ocfs2 an httpd<br>

            <br>

            MORE INFO:<br>

            - OS information:<br>

                Oracle Linux 6.4 32bit<br>

                4GB RAM<br>

                uname -a: 2.6.39-400.109.6.el6uek.i686 #1 SMP Wed Aug 28

            09:55:10 PDT 2013 i686 i686 i386 GNU/Linux<br>

                * anyway: we have another 5 nodes cluster with Oracle

            Linux 7.1 (so 64bit OS) serving a newer version of the same

            application and the problems are similar, so it appears not

            to be a OS problem but a more specific OCFS2 problem (bug?

            some tuning? other?)<br>

            <br>

            - standard configuration<br>

                * if you want I can show the cluster.conf configuration

            but is the "expected configuration"<br>

            <br>

            - standard configuration in o2cb:<br>

                Driver for "configfs": Loaded<br>

                Filesystem "configfs": Mounted<br>

                Stack glue driver: Loaded<br>

                Stack plugin "o2cb": Loaded<br>

                Driver for "ocfs2_dlmfs": Loaded<br>

                Filesystem "ocfs2_dlmfs": Mounted<br>

                Checking O2CB cluster "MoodleOCFS2": Online<br>

                  Heartbeat dead threshold: 31<br>

                  Network idle timeout: 30000<br>

                  Network keepalive delay: 2000<br>

                  Network reconnect delay: 2000<br>

                  Heartbeat mode: Local<br>

                Checking O2CB heartbeat: Active<br>

            <br>

            - mount options: _netdev,rw,noatime<br>

                * so other options (commit, data, ...) have their

            default values<br>

            <br>

            <br>

            Any ideas/suggestion?<br>

            <br>

            Regards.<br>

            <br>

          </tt>

          <div class="moz-signature">-- <br>

            <hr> <img src="cid:part3.03070502.06060303@oracle.com">

            <p class="MsoNormal"><b><font color="gray" face="Franklin

                  Gothic Book" size="1"><span

                    style="font-size:8.0pt;font-family:&quot;Franklin

                    Gothic Book&quot;;color:gray; font-weight:bold">

                    Area de Sistemas<br>

                    Servicio de las Tecnologias de la Informacion y

                    Comunicaciones (STIC)<br>

                    Universidad de Valladolid<br>

                    Edificio Alfonso VIII, C/Real de Burgos s/n. 47011,

                    Valladolid - ESPAÑA<br>

                    Telefono: 983 18-6410, Fax: 983 423271<br>

                    E-mail: <a moz-do-not-send="true"

                      class="moz-txt-link-abbreviated"

                      href="mailto:sistemas@uva.es">sistemas@uva.es</a><br>

                  </span></font></b></p>

            <b><font color="gray" face="Franklin Gothic Book" size="1">

                <hr> </font></b></div>

          <br>

          <fieldset class="mimeAttachmentHeader"></fieldset>

          <br>

          <pre wrap="">_______________________________________________

Ocfs2-users mailing list

<a moz-do-not-send="true" class="moz-txt-link-abbreviated" href="mailto:Ocfs2-users@oss.oracle.com">Ocfs2-users@oss.oracle.com</a>

<a moz-do-not-send="true" class="moz-txt-link-freetext" href="https://oss.oracle.com/mailman/listinfo/ocfs2-users">https://oss.oracle.com/mailman/listinfo/ocfs2-users</a></pre>

        </blockquote>

        <br>

      </blockquote>

      <br>

    </blockquote>

    <br>

  </body>

</html>