<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

  <head>

    <meta content="text/html; charset=ISO-8859-1"

      http-equiv="Content-Type">

  </head>

  <body bgcolor="#ffffff" text="#000000">

    I believe this is a pacemaker issue. There was a time it required a<br>

    qdisk to continue working as a single node in a 2 node cluster when<br>

    one node died. if pacemaker people don't jump in, you may want to<br>

    try your luck in the linux-cluster mailing list.<br>

    <br>

    On 04/01/2011 11:44 AM, Mike Reid wrote:

    <blockquote cite="mid:C9BB7915.FCF3%25mbreid@thepei.com" type="cite">

      <title>Node Recovery locks I/O in two-node OCFS2 cluster (DRBD

        8.3.8 / Ubuntu 10.10)</title>

      <font face="Calibri, Verdana, Helvetica, Arial"><span

          style="font-size: 11pt;">I am running a two-node web cluster

          on OCFS2 via DRBD Primary/Primary (v8.3.8) and Pacemaker.

          Everything &nbsp;seems to be working great, except during testing

          of hard-boot scenarios.<br>

          <br>

          Whenever I hard-boot one of the nodes, the other node is

          successfully fenced and marked &#8220;Outdated&#8221;<br>

          <br>

        </span></font>

      <ul>

        <li><span style="font-size: 11pt;"><font face="Courier, Courier

              New">&lt;resource minor="0" cs="WFConnection"

              ro1="Primary" ro2="<b>Unknown</b>" ds1="UpToDate" ds2="<b>Outdated</b>"

              /&gt;<br>

            </font></span></li>

      </ul>

      <span style="font-size: 11pt;"><font face="Courier, Courier New"><br>

        </font><font face="Calibri, Verdana, Helvetica, Arial">However,

          this locks up I/O on the still active node and prevents any

          operations within the cluster :(<br>

          I have even forced DRBD into StandAlone mode while in this

          state, but that does not resolve the I/O lock either.<br>

        </font><font face="Courier, Courier New"><br>

        </font></span>

      <ul>

        <li><span style="font-size: 11pt;"><font face="Courier, Courier

              New">&lt;resource minor="0" cs="<b>StandAlone</b>" ro1="<b>Primary</b>"

              ro2="Unknown" ds1="<b>UpToDate</b>" ds2="Outdated" /&gt;<br>

            </font></span></li>

      </ul>

      <span style="font-size: 11pt;"><font face="Courier, Courier New"><br>

        </font><font face="Calibri, Verdana, Helvetica, Arial">The only

          way I&#8217;ve been able to successfully regain I/O within the

          cluster is to bring back up the other node. While monitoring

          the logs, it seems that it is OCFS2 that&#8217;s establishing the

          lock/unlock and <i>not</i> DRBD at all.<br>

        </font></span>

      <blockquote><span style="font-size: 11pt;"><font face="Calibri,

            Verdana, Helvetica, Arial"><br>

            <br>

            Apr &nbsp;1 12:07:19 ubu10a kernel: [ 1352.739777]

            (ocfs2rec,3643,0):ocfs2_replay_journal:1605 Recovering node

            1124116672 from slot 1 on device (147,0)<br>

            Apr &nbsp;1 12:07:19 ubu10a kernel: [ 1352.900874]

            (ocfs2rec,3643,0):ocfs2_begin_quota_recovery:407 Beginning

            quota recovery in slot 1<br>

            Apr &nbsp;1 12:07:19 ubu10a kernel: [ 1352.902509]

            (ocfs2_wq,1213,0):ocfs2_finish_quota_recovery:598 Finishing

            quota recovery in slot 1<br>

            <br>

            Apr &nbsp;1 12:07:20 ubu10a kernel: [ 1354.423915] block drbd0:

            Handshake successful: Agreed network protocol version 94<br>

            Apr &nbsp;1 12:07:20 ubu10a kernel: [ 1354.433074] block drbd0:

            Peer authenticated using 20 bytes of 'sha1' HMAC<br>

            Apr &nbsp;1 12:07:20 ubu10a kernel: [ 1354.433083] block drbd0:

            conn( WFConnection -&gt; WFReportParams )<br>

            Apr &nbsp;1 12:07:20 ubu10a kernel: [ 1354.433097] block drbd0:

            Starting asender thread (from drbd0_receiver [2145])<br>

            Apr &nbsp;1 12:07:20 ubu10a kernel: [ 1354.433562] block drbd0:

            data-integrity-alg: &lt;not-used&gt;<br>

            Apr &nbsp;1 12:07:20 ubu10a kernel: [ 1354.434090] block drbd0:

            drbd_sync_handshake:<br>

            Apr &nbsp;1 12:07:20 ubu10a kernel: [ 1354.434094] block drbd0:

            self

            FBA98A2F89E05B83:EE17466F4DEC2F8B:6A4CD8FDD0562FA1:EC7831379B78B997

            bits:4 flags:0<br>

            Apr &nbsp;1 12:07:20 ubu10a kernel: [ 1354.434097] block drbd0:

            peer

            EE17466F4DEC2F8A:0000000000000000:6A4CD8FDD0562FA0:EC7831379B78B997

            bits:2048 flags:2<br>

            Apr &nbsp;1 12:07:20 ubu10a kernel: [ 1354.434099] block drbd0:

            uuid_compare()=1 by rule 70<br>

            Apr &nbsp;1 12:07:20 ubu10a kernel: [ 1354.434104] block drbd0:

            peer( Unknown -&gt; Secondary ) conn( WFReportParams -&gt;

            WFBitMapS )<br>

            Apr &nbsp;1 12:07:21 ubu10a kernel: [ 1354.601353] block drbd0:

            conn( WFBitMapS -&gt; SyncSource ) pdsk( Outdated -&gt;

            Inconsistent )<br>

            Apr &nbsp;1 12:07:21 ubu10a kernel: [ 1354.601367] block drbd0:

            Began resync as SyncSource (will sync 8192 KB [2048 bits

            set]).<br>

            Apr &nbsp;1 12:07:21 ubu10a kernel: [ 1355.401912] block drbd0:

            Resync done (total 1 sec; paused 0 sec; 8192 K/sec)<br>

            Apr &nbsp;1 12:07:21 ubu10a kernel: [ 1355.401923] block drbd0:

            conn( SyncSource -&gt; Connected ) pdsk( Inconsistent -&gt;

            UpToDate )<br>

            Apr &nbsp;1 12:07:22 ubu10a kernel: [ 1355.612601] block drbd0:

            peer( Secondary -&gt; Primary )<br>

            <br>

            <br>

          </font></span></blockquote>

      <span style="font-size: 11pt;"><font face="Calibri, Verdana,

          Helvetica, Arial">Therefore, my question is if there is an

          option in OCFS2 to remove / prevent this lock, especially

          since it&#8217;s inside a DRBD configuration? I&#8217;m still new to

          OCFS2, so I am definitely open to any criticism regarding my

          setup/approach, or any recommendations related to keeping the

          cluster active when another node is shutdown during testing.</font></span>

      <pre wrap="">

<fieldset class="mimeAttachmentHeader"></fieldset>

_______________________________________________

Ocfs2-users mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Ocfs2-users@oss.oracle.com">Ocfs2-users@oss.oracle.com</a>

<a class="moz-txt-link-freetext" href="http://oss.oracle.com/mailman/listinfo/ocfs2-users">http://oss.oracle.com/mailman/listinfo/ocfs2-users</a></pre>

    </blockquote>

    <br>

  </body>

</html>