<html>
  <head>
    <meta content="text/html; charset=windows-1252"
      http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    Hi Guy,<br>
    <br>
    <div class="moz-cite-prefix">On 01/21/2016 09:46 AM, Guy 2212112
      wrote:<br>
    </div>
    <blockquote
cite="mid:CAPtxBQgV5JDB5EZwj_2now3EjHAhX6joP7DPUGa=XoZpChaP-g@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div>
          <div>
            <div>
              <div>
                <div>
                  <div>
                    <div>
                      <div>
                        <div>
                          <div>
                            <div>
                              <div>
                                <div>Hi,<br>
                                </div>
                                First, I'm well aware that OCFS2 is not
                                a distributed file system, but a shared,
                                clustered file system. This is the main
                                reason that I want to use it - access
                                the same filesystem from multiple nodes.<br>
                              </div>
                              I've checked the latest Kernel 4.4 release
                              that include the "errors=continue" option
                              and installed also (manually) the patch
                              described in this thread - "[PATCH V2]
                              ocfs2: call ocfs2_abort when journal
                              abort" .<br>
                              <br>
                            </div>
                            Unfortunately the issues I've described
                            where not solved. <br>
                            <br>
                          </div>
                          Also, I understand that OCFS2 relies on the
                          SAN availability and is not replicating the
                          data to other locations (like a distributed
                          file system), so I don't expect to be able to
                          access the data when a disk/volume is not
                          accessible (for example because of hardware
                          failure).<br>
                          <br>
                        </div>
                        In other filesystems, clustered or even local,
                        when a disk/volume fails - this and only this
                        disk/volume cannot be accessed - and all the
                        other filesystems continue to function and can
                        accessed and the whole system stability is
                        definitely not compromised.<br>
                      </div>
                      <br>
                      Of course, I can understand that if this specific
                      disk/volume contains the operating system it
                      probably cause a  panic/reboot, or if the
                      disk/volume is used by the cluster as heartbeat,
                      it may influence the whole cluster - if it's the
                      only way the nodes in the cluster are using to
                      communicate between themselves.<br>
                    </div>
                    <br>
                  </div>
                  The configuration I use rely on Global heartbeat on
                  three different dedicated disks and the "simulated
                  error" is on an additional,fourth disk that doesn't
                  include a heartbeat.<br>
                </div>
              </div>
            </div>
          </div>
        </div>
      </div>
    </blockquote>
    By design, this should have worked fine and by design, even if one
    or more hb disk is failing systems should have survived as long as
    more than n/2 hb disks are good(where n stands for number of global
    hb disks &lt;= number of fs disks)<br>
    <br>
    So, this looks like a bug and needs to be looked into. I logged a bz
    to track this<br>
    <br>
    <a class="moz-txt-link-freetext" href="https://oss.oracle.com/bugzilla/show_bug.cgi?id=1362">https://oss.oracle.com/bugzilla/show_bug.cgi?id=1362</a><br>
    <br>
    ( I modified your description as I was running into some troubles bz
    application)<br>
    <br>
    <blockquote
cite="mid:CAPtxBQgV5JDB5EZwj_2now3EjHAhX6joP7DPUGa=XoZpChaP-g@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div>
          <div>
            <div>
              <div><br>
                Errors may occur on storage arrays and if I'm connecting
                my OCFS2 cluster to 4 storage arrays with each 10
                disks/volumes, I don't expect that the whole OCFS2
                cluster will fail when only one array is down. I still
                expect that the other 30 disks from the other 3
                remaining arrays will continue working.<br>
              </div>
              Of course, I will not have any access to the failed array
              disks.<br>
              <br>
            </div>
            I hope this describes better the situation,<br>
            <br>
          </div>
          Thanks,<br>
          <br>
        </div>
        Guy <br>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">On Wed, Jan 20, 2016 at 10:51 AM,
          Junxiao Bi <span dir="ltr">&lt;<a moz-do-not-send="true"
              href="mailto:junxiao.bi@oracle.com" target="_blank">junxiao.bi@oracle.com</a>&gt;</span>
          wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Guy,<br>
            <br>
            ocfs2 is shared-disk fs, there is no way to do replication
            like dfs,<br>
            also no volume manager integrated in ocfs2. Ocfs2 depends on
            underlying<br>
            storage stack to handler disk failure, so you can configure
            multipath,<br>
            raid or storage to handle removing disk issue. If io error
            is still<br>
            reported to ocfs2, then there is no way to workaround, ocfs2
            will be set<br>
            read-only or even panic to avoid fs corruption. This is the
            same<br>
            behavior with local fs.<br>
            If io error not reported to ocfs2, then there is a fix i
            just posted to<br>
            ocfs2-devel to avoid the node panic, please try patch serial
            [ocfs2:<br>
            o2hb: not fence self if storage down]. Note this is only
            useful to o2cb<br>
            stack. Nodes will hung on io and wait storage online again.<br>
            <br>
            For the endless loop you met in "Appendix A1", it is a bug
            and fixed by<br>
            "[PATCH V2] ocfs2: call ocfs2_abort when journal abort", you
            can get it<br>
            from ocfs2-devel. This patch will set fs readonly or panic
            node since io<br>
            error have been reported to ocfs2.<br>
            <br>
            Thanks,<br>
            Junxiao.<br>
            <br>
            On 01/20/2016 03:19 AM, Guy 1234 wrote:<br>
            &gt; Dear OCFS2 guys,<br>
            &gt;<br>
            &gt;<br>
            &gt;<br>
            &gt; My name is Guy, and I'm testing ocfs2 due to its
            features as a clustered<br>
            &gt; filesystem that I need.<br>
            &gt;<br>
            &gt; As part of the stability and reliability test I’ve
            performed, I've<br>
            &gt; encountered an issue with ocfs2 (format + mount +
            remove disk...), that<br>
            &gt; I wanted to make sure it is a real issue and not just a
            mis-configuration.<br>
            &gt;<br>
            &gt;<br>
            &gt;<br>
            &gt; The main concern is that the stability of the whole
            system is<br>
            &gt; compromised when a single disk/volumes fails. It looks
            like the OCFS2 is<br>
            &gt; not handling the error correctly but stuck in an
            endless loop that<br>
            &gt; interferes with the work of the server.<br>
            &gt;<br>
            &gt;<br>
            &gt;<br>
            &gt; I’ve test tested two cluster configurations – (1)
            Corosync/Pacemaker and<br>
            &gt; (2) o2cb that react similarly.<br>
            &gt;<br>
            &gt; Following the process and log entries:<br>
            &gt;<br>
            &gt;<br>
            &gt; Also below additional configuration that were tested.<br>
            &gt;<br>
            &gt;<br>
            &gt; Node 1:<br>
            &gt;<br>
            &gt; =======<br>
            &gt;<br>
            &gt; 1. service corosync start<br>
            &gt;<br>
            &gt; 2. service dlm start<br>
            &gt;<br>
            &gt; 3. mkfs.ocfs2 -v -Jblock64 -b 4096
            --fs-feature-level=max-features<br>
            &gt; --cluster-=pcmk --cluster-name=cluster-name -N 2
            /dev/&lt;path to device&gt;<br>
            &gt;<br>
            &gt; 4. mount -o<br>
            &gt;
            rw,noatime,nodiratime,data=writeback,heartbeat=none,cluster_stack=pcmk<br>
            &gt; /dev/&lt;path to device&gt; /mnt/ocfs2-mountpoint<br>
            &gt;<br>
            &gt;<br>
            &gt;<br>
            &gt; Node 2:<br>
            &gt;<br>
            &gt; =======<br>
            &gt;<br>
            &gt; 5. service corosync start<br>
            &gt;<br>
            &gt; 6. service dlm start<br>
            &gt;<br>
            &gt; 7. mount -o<br>
            &gt;
            rw,noatime,nodiratime,data=writeback,heartbeat=none,cluster_stack=pcmk<br>
            &gt; /dev/&lt;path to device&gt; /mnt/ocfs2-mountpoint<br>
            &gt;<br>
            &gt;<br>
            &gt;<br>
            &gt; So far all is working well, including reading and
            writing.<br>
            &gt;<br>
            &gt; Next<br>
            &gt;<br>
            &gt; 8. I’ve physically, pull out the disk at /dev/&lt;path
            to device&gt; to<br>
            &gt; simulate a hardware failure (that may occur…) , in real
            life the disk is<br>
            &gt; (hardware or software) protected. Nonetheless, I’m
            testing a hardware<br>
            &gt; failure that the one of the OCFS2 file systems in my
            server fails.<br>
            &gt;<br>
            &gt; Following  - messages observed in the system log (see
            below) and<br>
            &gt;<br>
            &gt; ==&gt;  9. kernel panic(!) ... in one of the nodes or
            on both, or reboot on<br>
            &gt; one of the nodes or both.<br>
            &gt;<br>
            &gt;<br>
            &gt; Is there any configuration or set of parameters that
            will enable the<br>
            &gt; system to continue working, disabling the access to the
            failed disk<br>
            &gt; without compromising the system stability and not cause
            the kernel to<br>
            &gt; panic?!<br>
            &gt;<br>
            &gt;<br>
            &gt;<br>
            &gt; From my point of view it looks basics – when a hardware
            failure occurs:<br>
            &gt;<br>
            &gt; 1. All remaining hardware should continue working<br>
            &gt;<br>
            &gt; 2. The failed disk/volume should be inaccessible – but
            not compromise<br>
            &gt; the whole system availability (Kernel panic).<br>
            &gt;<br>
            &gt; 3. OCFS2 “understands” there’s a failed disk and stop
            trying to access it.<br>
            &gt;<br>
            &gt; 3. All disk commands such as mount/umount, df etc.
            should continue working.<br>
            &gt;<br>
            &gt; 4. When a new/replacement drive is connected to the
            system, it can be<br>
            &gt; accessed.<br>
            &gt;<br>
            &gt; My settings:<br>
            &gt;<br>
            &gt; ubuntu 14.04<br>
            &gt;<br>
            &gt; linux:  3.16.0-46-generic<br>
            &gt;<br>
            &gt; mkfs.ocfs2 1.8.4 (downloaded from git)<br>
            &gt;<br>
            &gt;<br>
            &gt;<br>
            &gt;<br>
            &gt;<br>
            &gt; Some other scenarios which also were tested:<br>
            &gt;<br>
            &gt; 1. Remove the max-features in the mkfs (i.e. mkfs.ocfs2
            -v -Jblock64 -b<br>
            &gt; 4096 --cluster-stack=pcmk --cluster-name=cluster-name
            -N 2 /dev/&lt;path to<br>
            &gt; device&gt;)<br>
            &gt;<br>
            &gt; This improved in some of the cases with no kernel panic
            but still the<br>
            &gt; stability of the system was compromised, the syslog
            indicates that<br>
            &gt; something unrecoverable is going on (See below -
            Appendix A1).<br>
            &gt; Furthermore, System is hanging when trying to software
            reboot.<br>
            &gt;<br>
            &gt; 2. Also tried with the o2cb stack, with similar
            outcomes.<br>
            &gt;<br>
            &gt; 3. The configuration was also tested with (1,2 and 3)
            Local and Global<br>
            &gt; heartbeat(s) that were NOT on the simulated failed
            disk, but on other<br>
            &gt; physical disks.<br>
            &gt;<br>
            &gt; 4. Also tested:<br>
            &gt;<br>
            &gt; Ubuntu 15.15<br>
            &gt;<br>
            &gt; Kernel: 4.2.0-23-generic<br>
            &gt;<br>
            &gt; mkfs.ocfs2 1.8.4 (git clone git://<a
              moz-do-not-send="true"
              href="http://oss.oracle.com/git/ocfs2-tools.git"
              rel="noreferrer" target="_blank">oss.oracle.com/git/ocfs2-tools.git</a><br>
            &gt; &lt;<a moz-do-not-send="true"
              href="http://oss.oracle.com/git/ocfs2-tools.git"
              rel="noreferrer" target="_blank">http://oss.oracle.com/git/ocfs2-tools.git</a>&gt;)<br>
            &gt;<br>
            &gt;<br>
            &gt;<br>
            &gt;<br>
            &gt;<br>
            &gt; ==============<br>
            &gt;<br>
            &gt; Appendix A1:<br>
            &gt;<br>
            &gt; ==============<br>
            &gt;<br>
            &gt; from syslog:<br>
            &gt;<br>
            &gt; [ 1676.608123]
            (ocfs2cmt,5316,14):ocfs2_commit_thread:2195 ERROR: status<br>
            &gt; = -5, journal is already aborted.<br>
            &gt;<br>
            &gt; [ 1677.611827]
            (ocfs2cmt,5316,14):ocfs2_commit_cache:324 ERROR: status = -5<br>
            &gt;<br>
            &gt; [ 1678.616634]
            (ocfs2cmt,5316,15):ocfs2_commit_cache:324 ERROR: status = -5<br>
            &gt;<br>
            &gt; [ 1679.621419]
            (ocfs2cmt,5316,15):ocfs2_commit_cache:324 ERROR: status = -5<br>
            &gt;<br>
            &gt; [ 1680.626175]
            (ocfs2cmt,5316,15):ocfs2_commit_cache:324 ERROR: status = -5<br>
            &gt;<br>
            &gt; [ 1681.630981] (ocfs2cmt,5316,9):ocfs2_commit_cache:324
            ERROR: status = -5<br>
            &gt;<br>
            &gt; [ 1682.107356] INFO: task kworker/u64:0:6 blocked for
            more than 120 seconds.<br>
            &gt;<br>
            &gt; [ 1682.108440]       Not tainted 3.16.0-46-generic
            #62~14.04.1<br>
            &gt;<br>
            &gt; [ 1682.109388] "echo 0 &gt;
            /proc/sys/kernel/hung_task_timeout_secs"<br>
            &gt; disables this message.<br>
            &gt;<br>
            &gt; [ 1682.110381] kworker/u64:0   D ffff88103fcb30c0   
             0     6      2<br>
            &gt; 0x00000000<br>
            &gt;<br>
            &gt; [ 1682.110401] Workqueue: fw_event0
            _firmware_event_work [mpt3sas]<br>
            &gt;<br>
            &gt; [ 1682.110405]  ffff88102910b8a0 0000000000000046
            ffff88102977b2f0<br>
            &gt; 00000000000130c0<br>
            &gt;<br>
            &gt; [ 1682.110411]  ffff88102910bfd8 00000000000130c0
            ffff88102928c750<br>
            &gt; ffff88201db284b0<br>
            &gt;<br>
            &gt; [ 1682.110415]  ffff88201db28000 ffff881028cef000
            ffff88201db28138<br>
            &gt; ffff88201db28268<br>
            &gt;<br>
            &gt; [ 1682.110419] Call Trace:<br>
            &gt;<br>
            &gt; [ 1682.110427]  [&lt;ffffffff8176a8b9&gt;]
            schedule+0x29/0x70<br>
            &gt;<br>
            &gt; [ 1682.110458]  [&lt;ffffffffc08d6c11&gt;]
            ocfs2_clear_inode+0x3b1/0xa30 [ocfs2]<br>
            &gt;<br>
            &gt; [ 1682.110464]  [&lt;ffffffff810b4de0&gt;] ?
            prepare_to_wait_event+0x100/0x100<br>
            &gt;<br>
            &gt; [ 1682.110487]  [&lt;ffffffffc08d8c7e&gt;]
            ocfs2_evict_inode+0x6e/0x730 [ocfs2]<br>
            &gt;<br>
            &gt; [ 1682.110493]  [&lt;ffffffff811eee04&gt;]
            evict+0xb4/0x180<br>
            &gt;<br>
            &gt; [ 1682.110498]  [&lt;ffffffff811eef09&gt;]
            dispose_list+0x39/0x50<br>
            &gt;<br>
            &gt; [ 1682.110501]  [&lt;ffffffff811efdb4&gt;]
            invalidate_inodes+0x134/0x150<br>
            &gt;<br>
            &gt; [ 1682.110506]  [&lt;ffffffff8120a64a&gt;]
            __invalidate_device+0x3a/0x60<br>
            &gt;<br>
            &gt; [ 1682.110510]  [&lt;ffffffff81367e81&gt;]
            invalidate_partition+0x31/0x50<br>
            &gt;<br>
            &gt; [ 1682.110513]  [&lt;ffffffff81368f45&gt;]
            del_gendisk+0xf5/0x290<br>
            &gt;<br>
            &gt; [ 1682.110519]  [&lt;ffffffff815177a1&gt;]
            sd_remove+0x61/0xc0<br>
            &gt;<br>
            &gt; [ 1682.110524]  [&lt;ffffffff814baf7f&gt;]
            __device_release_driver+0x7f/0xf0<br>
            &gt;<br>
            &gt; [ 1682.110529]  [&lt;ffffffff814bb013&gt;]
            device_release_driver+0x23/0x30<br>
            &gt;<br>
            &gt; [ 1682.110534]  [&lt;ffffffff814ba918&gt;]
            bus_remove_device+0x108/0x180<br>
            &gt;<br>
            &gt; [ 1682.110538]  [&lt;ffffffff814b7169&gt;]
            device_del+0x129/0x1c0<br>
            &gt;<br>
            &gt; [ 1682.110543]  [&lt;ffffffff815123a5&gt;]
            __scsi_remove_device+0xd5/0xe0<br>
            &gt;<br>
            &gt; [ 1682.110547]  [&lt;ffffffff815123d6&gt;]
            scsi_remove_device+0x26/0x40<br>
            &gt;<br>
            &gt; [ 1682.110551]  [&lt;ffffffff81512590&gt;]
            scsi_remove_target+0x170/0x230<br>
            &gt;<br>
            &gt; [ 1682.110561]  [&lt;ffffffffc03551e5&gt;]
            sas_rphy_remove+0x65/0x80<br>
            &gt; [scsi_transport_sas]<br>
            &gt;<br>
            &gt; [ 1682.110570]  [&lt;ffffffffc035707d&gt;]
            sas_port_delete+0x2d/0x170<br>
            &gt; [scsi_transport_sas]<br>
            &gt;<br>
            &gt; [ 1682.110575]  [&lt;ffffffff8124a6f9&gt;] ?
            sysfs_remove_link+0x19/0x30<br>
            &gt;<br>
            &gt; [ 1682.110588]  [&lt;ffffffffc03f1599&gt;]<br>
            &gt; mpt3sas_transport_port_remove+0x1c9/0x1e0 [mpt3sas]<br>
            &gt;<br>
            &gt; [ 1682.110598]  [&lt;ffffffffc03e60b5&gt;]
            _scsih_remove_device+0x55/0x80<br>
            &gt; [mpt3sas]<br>
            &gt;<br>
            &gt; [ 1682.110610]  [&lt;ffffffffc03e6159&gt;]<br>
            &gt; _scsih_device_remove_by_handle.part.21+0x79/0xa0
            [mpt3sas]<br>
            &gt;<br>
            &gt; [ 1682.110619]  [&lt;ffffffffc03eca97&gt;]
            _firmware_event_work+0x1337/0x1690<br>
            &gt; [mpt3sas]<br>
            &gt;<br>
            &gt; [ 1682.110626]  [&lt;ffffffff8101c315&gt;] ?
            native_sched_clock+0x35/0x90<br>
            &gt;<br>
            &gt; [ 1682.110630]  [&lt;ffffffff8101c379&gt;] ?
            sched_clock+0x9/0x10<br>
            &gt;<br>
            &gt; [ 1682.110636]  [&lt;ffffffff81011574&gt;] ?
            __switch_to+0xe4/0x580<br>
            &gt;<br>
            &gt; [ 1682.110640]  [&lt;ffffffff81087bc9&gt;] ?
            pwq_activate_delayed_work+0x39/0x80<br>
            &gt;<br>
            &gt; [ 1682.110644]  [&lt;ffffffff8108a302&gt;]
            process_one_work+0x182/0x450<br>
            &gt;<br>
            &gt; [ 1682.110648]  [&lt;ffffffff8108aa71&gt;]
            worker_thread+0x121/0x570<br>
            &gt;<br>
            &gt; [ 1682.110652]  [&lt;ffffffff8108a950&gt;] ?
            rescuer_thread+0x380/0x380<br>
            &gt;<br>
            &gt; [ 1682.110657]  [&lt;ffffffff81091309&gt;]
            kthread+0xc9/0xe0<br>
            &gt;<br>
            &gt; [ 1682.110662]  [&lt;ffffffff81091240&gt;] ?
            kthread_create_on_node+0x1c0/0x1c0<br>
            &gt;<br>
            &gt; [ 1682.110667]  [&lt;ffffffff8176e818&gt;]
            ret_from_fork+0x58/0x90<br>
            &gt;<br>
            &gt; [ 1682.110672]  [&lt;ffffffff81091240&gt;] ?
            kthread_create_on_node+0x1c0/0x1c0<br>
            &gt;<br>
            &gt; [ 1682.635761] (ocfs2cmt,5316,9):ocfs2_commit_cache:324
            ERROR: status = -5<br>
            &gt;<br>
            &gt; [ 1683.640549] (ocfs2cmt,5316,9):ocfs2_commit_cache:324
            ERROR: status = -5<br>
            &gt;<br>
            &gt; [ 1684.645336] (ocfs2cmt,5316,9):ocfs2_commit_cache:324
            ERROR: status = -5<br>
            &gt;<br>
            &gt; [ 1685.650114] (ocfs2cmt,5316,9):ocfs2_commit_cache:324
            ERROR: status = -5<br>
            &gt;<br>
            &gt; [ 1686.654911] (ocfs2cmt,5316,9):ocfs2_commit_cache:324
            ERROR: status = -5<br>
            &gt;<br>
            &gt; [ 1687.659684] (ocfs2cmt,5316,9):ocfs2_commit_cache:324
            ERROR: status = -5<br>
            &gt;<br>
            &gt; [ 1688.664466] (ocfs2cmt,5316,9):ocfs2_commit_cache:324
            ERROR: status = -5<br>
            &gt;<br>
            &gt; [ 1689.669252] (ocfs2cmt,5316,9):ocfs2_commit_cache:324
            ERROR: status = -5<br>
            &gt;<br>
            &gt; [ 1690.674026] (ocfs2cmt,5316,9):ocfs2_commit_cache:324
            ERROR: status = -5<br>
            &gt;<br>
            &gt; [ 1691.678810] (ocfs2cmt,5316,9):ocfs2_commit_cache:324
            ERROR: status = -5<br>
            &gt;<br>
            &gt; [ 1691.679920]
            (ocfs2cmt,5316,9):ocfs2_commit_thread:2195 ERROR: status<br>
            &gt; = -5, journal is already aborted.<br>
            &gt;<br>
            &gt;<br>
            &gt;<br>
            &gt; Thanks in advance,<br>
            &gt;<br>
            &gt; Guy<br>
            &gt;<br>
            &gt;<br>
            &gt;<br>
            &gt; _______________________________________________<br>
            &gt; Ocfs2-devel mailing list<br>
            &gt; <a moz-do-not-send="true"
              href="mailto:Ocfs2-devel@oss.oracle.com">Ocfs2-devel@oss.oracle.com</a><br>
            &gt; <a moz-do-not-send="true"
              href="https://oss.oracle.com/mailman/listinfo/ocfs2-devel"
              rel="noreferrer" target="_blank">https://oss.oracle.com/mailman/listinfo/ocfs2-devel</a><br>
            &gt;<br>
            <br>
          </blockquote>
        </div>
        <br>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
Ocfs2-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Ocfs2-devel@oss.oracle.com">Ocfs2-devel@oss.oracle.com</a>
<a class="moz-txt-link-freetext" href="https://oss.oracle.com/mailman/listinfo/ocfs2-devel">https://oss.oracle.com/mailman/listinfo/ocfs2-devel</a></pre>
    </blockquote>
    <br>
  </body>
</html>