<html>
<head>
<meta content="text/html; charset=windows-1252"
http-equiv="Content-Type">
</head>
<body text="#000000" bgcolor="#FFFFFF">
Hi Guy,<br>
<br>
<div class="moz-cite-prefix">On 01/21/2016 09:46 AM, Guy 2212112
wrote:<br>
</div>
<blockquote
cite="mid:CAPtxBQgV5JDB5EZwj_2now3EjHAhX6joP7DPUGa=XoZpChaP-g@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>
<div>Hi,<br>
</div>
First, I'm well aware that OCFS2 is not
a distributed file system, but a shared,
clustered file system. This is the main
reason that I want to use it - access
the same filesystem from multiple nodes.<br>
</div>
I've checked the latest Kernel 4.4 release
that include the "errors=continue" option
and installed also (manually) the patch
described in this thread - "[PATCH V2]
ocfs2: call ocfs2_abort when journal
abort" .<br>
<br>
</div>
Unfortunately the issues I've described
where not solved. <br>
<br>
</div>
Also, I understand that OCFS2 relies on the
SAN availability and is not replicating the
data to other locations (like a distributed
file system), so I don't expect to be able to
access the data when a disk/volume is not
accessible (for example because of hardware
failure).<br>
<br>
</div>
In other filesystems, clustered or even local,
when a disk/volume fails - this and only this
disk/volume cannot be accessed - and all the
other filesystems continue to function and can
accessed and the whole system stability is
definitely not compromised.<br>
</div>
<br>
Of course, I can understand that if this specific
disk/volume contains the operating system it
probably cause a panic/reboot, or if the
disk/volume is used by the cluster as heartbeat,
it may influence the whole cluster - if it's the
only way the nodes in the cluster are using to
communicate between themselves.<br>
</div>
<br>
</div>
The configuration I use rely on Global heartbeat on
three different dedicated disks and the "simulated
error" is on an additional,fourth disk that doesn't
include a heartbeat.<br>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
By design, this should have worked fine and by design, even if one
or more hb disk is failing systems should have survived as long as
more than n/2 hb disks are good(where n stands for number of global
hb disks <= number of fs disks)<br>
<br>
So, this looks like a bug and needs to be looked into. I logged a bz
to track this<br>
<br>
<a class="moz-txt-link-freetext" href="https://oss.oracle.com/bugzilla/show_bug.cgi?id=1362">https://oss.oracle.com/bugzilla/show_bug.cgi?id=1362</a><br>
<br>
( I modified your description as I was running into some troubles bz
application)<br>
<br>
<blockquote
cite="mid:CAPtxBQgV5JDB5EZwj_2now3EjHAhX6joP7DPUGa=XoZpChaP-g@mail.gmail.com"
type="cite">
<div dir="ltr">
<div>
<div>
<div>
<div><br>
Errors may occur on storage arrays and if I'm connecting
my OCFS2 cluster to 4 storage arrays with each 10
disks/volumes, I don't expect that the whole OCFS2
cluster will fail when only one array is down. I still
expect that the other 30 disks from the other 3
remaining arrays will continue working.<br>
</div>
Of course, I will not have any access to the failed array
disks.<br>
<br>
</div>
I hope this describes better the situation,<br>
<br>
</div>
Thanks,<br>
<br>
</div>
Guy <br>
</div>
<div class="gmail_extra"><br>
<div class="gmail_quote">On Wed, Jan 20, 2016 at 10:51 AM,
Junxiao Bi <span dir="ltr"><<a moz-do-not-send="true"
href="mailto:junxiao.bi@oracle.com" target="_blank">junxiao.bi@oracle.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Guy,<br>
<br>
ocfs2 is shared-disk fs, there is no way to do replication
like dfs,<br>
also no volume manager integrated in ocfs2. Ocfs2 depends on
underlying<br>
storage stack to handler disk failure, so you can configure
multipath,<br>
raid or storage to handle removing disk issue. If io error
is still<br>
reported to ocfs2, then there is no way to workaround, ocfs2
will be set<br>
read-only or even panic to avoid fs corruption. This is the
same<br>
behavior with local fs.<br>
If io error not reported to ocfs2, then there is a fix i
just posted to<br>
ocfs2-devel to avoid the node panic, please try patch serial
[ocfs2:<br>
o2hb: not fence self if storage down]. Note this is only
useful to o2cb<br>
stack. Nodes will hung on io and wait storage online again.<br>
<br>
For the endless loop you met in "Appendix A1", it is a bug
and fixed by<br>
"[PATCH V2] ocfs2: call ocfs2_abort when journal abort", you
can get it<br>
from ocfs2-devel. This patch will set fs readonly or panic
node since io<br>
error have been reported to ocfs2.<br>
<br>
Thanks,<br>
Junxiao.<br>
<br>
On 01/20/2016 03:19 AM, Guy 1234 wrote:<br>
> Dear OCFS2 guys,<br>
><br>
><br>
><br>
> My name is Guy, and I'm testing ocfs2 due to its
features as a clustered<br>
> filesystem that I need.<br>
><br>
> As part of the stability and reliability test I’ve
performed, I've<br>
> encountered an issue with ocfs2 (format + mount +
remove disk...), that<br>
> I wanted to make sure it is a real issue and not just a
mis-configuration.<br>
><br>
><br>
><br>
> The main concern is that the stability of the whole
system is<br>
> compromised when a single disk/volumes fails. It looks
like the OCFS2 is<br>
> not handling the error correctly but stuck in an
endless loop that<br>
> interferes with the work of the server.<br>
><br>
><br>
><br>
> I’ve test tested two cluster configurations – (1)
Corosync/Pacemaker and<br>
> (2) o2cb that react similarly.<br>
><br>
> Following the process and log entries:<br>
><br>
><br>
> Also below additional configuration that were tested.<br>
><br>
><br>
> Node 1:<br>
><br>
> =======<br>
><br>
> 1. service corosync start<br>
><br>
> 2. service dlm start<br>
><br>
> 3. mkfs.ocfs2 -v -Jblock64 -b 4096
--fs-feature-level=max-features<br>
> --cluster-=pcmk --cluster-name=cluster-name -N 2
/dev/<path to device><br>
><br>
> 4. mount -o<br>
>
rw,noatime,nodiratime,data=writeback,heartbeat=none,cluster_stack=pcmk<br>
> /dev/<path to device> /mnt/ocfs2-mountpoint<br>
><br>
><br>
><br>
> Node 2:<br>
><br>
> =======<br>
><br>
> 5. service corosync start<br>
><br>
> 6. service dlm start<br>
><br>
> 7. mount -o<br>
>
rw,noatime,nodiratime,data=writeback,heartbeat=none,cluster_stack=pcmk<br>
> /dev/<path to device> /mnt/ocfs2-mountpoint<br>
><br>
><br>
><br>
> So far all is working well, including reading and
writing.<br>
><br>
> Next<br>
><br>
> 8. I’ve physically, pull out the disk at /dev/<path
to device> to<br>
> simulate a hardware failure (that may occur…) , in real
life the disk is<br>
> (hardware or software) protected. Nonetheless, I’m
testing a hardware<br>
> failure that the one of the OCFS2 file systems in my
server fails.<br>
><br>
> Following - messages observed in the system log (see
below) and<br>
><br>
> ==> 9. kernel panic(!) ... in one of the nodes or
on both, or reboot on<br>
> one of the nodes or both.<br>
><br>
><br>
> Is there any configuration or set of parameters that
will enable the<br>
> system to continue working, disabling the access to the
failed disk<br>
> without compromising the system stability and not cause
the kernel to<br>
> panic?!<br>
><br>
><br>
><br>
> From my point of view it looks basics – when a hardware
failure occurs:<br>
><br>
> 1. All remaining hardware should continue working<br>
><br>
> 2. The failed disk/volume should be inaccessible – but
not compromise<br>
> the whole system availability (Kernel panic).<br>
><br>
> 3. OCFS2 “understands” there’s a failed disk and stop
trying to access it.<br>
><br>
> 3. All disk commands such as mount/umount, df etc.
should continue working.<br>
><br>
> 4. When a new/replacement drive is connected to the
system, it can be<br>
> accessed.<br>
><br>
> My settings:<br>
><br>
> ubuntu 14.04<br>
><br>
> linux: 3.16.0-46-generic<br>
><br>
> mkfs.ocfs2 1.8.4 (downloaded from git)<br>
><br>
><br>
><br>
><br>
><br>
> Some other scenarios which also were tested:<br>
><br>
> 1. Remove the max-features in the mkfs (i.e. mkfs.ocfs2
-v -Jblock64 -b<br>
> 4096 --cluster-stack=pcmk --cluster-name=cluster-name
-N 2 /dev/<path to<br>
> device>)<br>
><br>
> This improved in some of the cases with no kernel panic
but still the<br>
> stability of the system was compromised, the syslog
indicates that<br>
> something unrecoverable is going on (See below -
Appendix A1).<br>
> Furthermore, System is hanging when trying to software
reboot.<br>
><br>
> 2. Also tried with the o2cb stack, with similar
outcomes.<br>
><br>
> 3. The configuration was also tested with (1,2 and 3)
Local and Global<br>
> heartbeat(s) that were NOT on the simulated failed
disk, but on other<br>
> physical disks.<br>
><br>
> 4. Also tested:<br>
><br>
> Ubuntu 15.15<br>
><br>
> Kernel: 4.2.0-23-generic<br>
><br>
> mkfs.ocfs2 1.8.4 (git clone git://<a
moz-do-not-send="true"
href="http://oss.oracle.com/git/ocfs2-tools.git"
rel="noreferrer" target="_blank">oss.oracle.com/git/ocfs2-tools.git</a><br>
> <<a moz-do-not-send="true"
href="http://oss.oracle.com/git/ocfs2-tools.git"
rel="noreferrer" target="_blank">http://oss.oracle.com/git/ocfs2-tools.git</a>>)<br>
><br>
><br>
><br>
><br>
><br>
> ==============<br>
><br>
> Appendix A1:<br>
><br>
> ==============<br>
><br>
> from syslog:<br>
><br>
> [ 1676.608123]
(ocfs2cmt,5316,14):ocfs2_commit_thread:2195 ERROR: status<br>
> = -5, journal is already aborted.<br>
><br>
> [ 1677.611827]
(ocfs2cmt,5316,14):ocfs2_commit_cache:324 ERROR: status = -5<br>
><br>
> [ 1678.616634]
(ocfs2cmt,5316,15):ocfs2_commit_cache:324 ERROR: status = -5<br>
><br>
> [ 1679.621419]
(ocfs2cmt,5316,15):ocfs2_commit_cache:324 ERROR: status = -5<br>
><br>
> [ 1680.626175]
(ocfs2cmt,5316,15):ocfs2_commit_cache:324 ERROR: status = -5<br>
><br>
> [ 1681.630981] (ocfs2cmt,5316,9):ocfs2_commit_cache:324
ERROR: status = -5<br>
><br>
> [ 1682.107356] INFO: task kworker/u64:0:6 blocked for
more than 120 seconds.<br>
><br>
> [ 1682.108440] Not tainted 3.16.0-46-generic
#62~14.04.1<br>
><br>
> [ 1682.109388] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs"<br>
> disables this message.<br>
><br>
> [ 1682.110381] kworker/u64:0 D ffff88103fcb30c0
0 6 2<br>
> 0x00000000<br>
><br>
> [ 1682.110401] Workqueue: fw_event0
_firmware_event_work [mpt3sas]<br>
><br>
> [ 1682.110405] ffff88102910b8a0 0000000000000046
ffff88102977b2f0<br>
> 00000000000130c0<br>
><br>
> [ 1682.110411] ffff88102910bfd8 00000000000130c0
ffff88102928c750<br>
> ffff88201db284b0<br>
><br>
> [ 1682.110415] ffff88201db28000 ffff881028cef000
ffff88201db28138<br>
> ffff88201db28268<br>
><br>
> [ 1682.110419] Call Trace:<br>
><br>
> [ 1682.110427] [<ffffffff8176a8b9>]
schedule+0x29/0x70<br>
><br>
> [ 1682.110458] [<ffffffffc08d6c11>]
ocfs2_clear_inode+0x3b1/0xa30 [ocfs2]<br>
><br>
> [ 1682.110464] [<ffffffff810b4de0>] ?
prepare_to_wait_event+0x100/0x100<br>
><br>
> [ 1682.110487] [<ffffffffc08d8c7e>]
ocfs2_evict_inode+0x6e/0x730 [ocfs2]<br>
><br>
> [ 1682.110493] [<ffffffff811eee04>]
evict+0xb4/0x180<br>
><br>
> [ 1682.110498] [<ffffffff811eef09>]
dispose_list+0x39/0x50<br>
><br>
> [ 1682.110501] [<ffffffff811efdb4>]
invalidate_inodes+0x134/0x150<br>
><br>
> [ 1682.110506] [<ffffffff8120a64a>]
__invalidate_device+0x3a/0x60<br>
><br>
> [ 1682.110510] [<ffffffff81367e81>]
invalidate_partition+0x31/0x50<br>
><br>
> [ 1682.110513] [<ffffffff81368f45>]
del_gendisk+0xf5/0x290<br>
><br>
> [ 1682.110519] [<ffffffff815177a1>]
sd_remove+0x61/0xc0<br>
><br>
> [ 1682.110524] [<ffffffff814baf7f>]
__device_release_driver+0x7f/0xf0<br>
><br>
> [ 1682.110529] [<ffffffff814bb013>]
device_release_driver+0x23/0x30<br>
><br>
> [ 1682.110534] [<ffffffff814ba918>]
bus_remove_device+0x108/0x180<br>
><br>
> [ 1682.110538] [<ffffffff814b7169>]
device_del+0x129/0x1c0<br>
><br>
> [ 1682.110543] [<ffffffff815123a5>]
__scsi_remove_device+0xd5/0xe0<br>
><br>
> [ 1682.110547] [<ffffffff815123d6>]
scsi_remove_device+0x26/0x40<br>
><br>
> [ 1682.110551] [<ffffffff81512590>]
scsi_remove_target+0x170/0x230<br>
><br>
> [ 1682.110561] [<ffffffffc03551e5>]
sas_rphy_remove+0x65/0x80<br>
> [scsi_transport_sas]<br>
><br>
> [ 1682.110570] [<ffffffffc035707d>]
sas_port_delete+0x2d/0x170<br>
> [scsi_transport_sas]<br>
><br>
> [ 1682.110575] [<ffffffff8124a6f9>] ?
sysfs_remove_link+0x19/0x30<br>
><br>
> [ 1682.110588] [<ffffffffc03f1599>]<br>
> mpt3sas_transport_port_remove+0x1c9/0x1e0 [mpt3sas]<br>
><br>
> [ 1682.110598] [<ffffffffc03e60b5>]
_scsih_remove_device+0x55/0x80<br>
> [mpt3sas]<br>
><br>
> [ 1682.110610] [<ffffffffc03e6159>]<br>
> _scsih_device_remove_by_handle.part.21+0x79/0xa0
[mpt3sas]<br>
><br>
> [ 1682.110619] [<ffffffffc03eca97>]
_firmware_event_work+0x1337/0x1690<br>
> [mpt3sas]<br>
><br>
> [ 1682.110626] [<ffffffff8101c315>] ?
native_sched_clock+0x35/0x90<br>
><br>
> [ 1682.110630] [<ffffffff8101c379>] ?
sched_clock+0x9/0x10<br>
><br>
> [ 1682.110636] [<ffffffff81011574>] ?
__switch_to+0xe4/0x580<br>
><br>
> [ 1682.110640] [<ffffffff81087bc9>] ?
pwq_activate_delayed_work+0x39/0x80<br>
><br>
> [ 1682.110644] [<ffffffff8108a302>]
process_one_work+0x182/0x450<br>
><br>
> [ 1682.110648] [<ffffffff8108aa71>]
worker_thread+0x121/0x570<br>
><br>
> [ 1682.110652] [<ffffffff8108a950>] ?
rescuer_thread+0x380/0x380<br>
><br>
> [ 1682.110657] [<ffffffff81091309>]
kthread+0xc9/0xe0<br>
><br>
> [ 1682.110662] [<ffffffff81091240>] ?
kthread_create_on_node+0x1c0/0x1c0<br>
><br>
> [ 1682.110667] [<ffffffff8176e818>]
ret_from_fork+0x58/0x90<br>
><br>
> [ 1682.110672] [<ffffffff81091240>] ?
kthread_create_on_node+0x1c0/0x1c0<br>
><br>
> [ 1682.635761] (ocfs2cmt,5316,9):ocfs2_commit_cache:324
ERROR: status = -5<br>
><br>
> [ 1683.640549] (ocfs2cmt,5316,9):ocfs2_commit_cache:324
ERROR: status = -5<br>
><br>
> [ 1684.645336] (ocfs2cmt,5316,9):ocfs2_commit_cache:324
ERROR: status = -5<br>
><br>
> [ 1685.650114] (ocfs2cmt,5316,9):ocfs2_commit_cache:324
ERROR: status = -5<br>
><br>
> [ 1686.654911] (ocfs2cmt,5316,9):ocfs2_commit_cache:324
ERROR: status = -5<br>
><br>
> [ 1687.659684] (ocfs2cmt,5316,9):ocfs2_commit_cache:324
ERROR: status = -5<br>
><br>
> [ 1688.664466] (ocfs2cmt,5316,9):ocfs2_commit_cache:324
ERROR: status = -5<br>
><br>
> [ 1689.669252] (ocfs2cmt,5316,9):ocfs2_commit_cache:324
ERROR: status = -5<br>
><br>
> [ 1690.674026] (ocfs2cmt,5316,9):ocfs2_commit_cache:324
ERROR: status = -5<br>
><br>
> [ 1691.678810] (ocfs2cmt,5316,9):ocfs2_commit_cache:324
ERROR: status = -5<br>
><br>
> [ 1691.679920]
(ocfs2cmt,5316,9):ocfs2_commit_thread:2195 ERROR: status<br>
> = -5, journal is already aborted.<br>
><br>
><br>
><br>
> Thanks in advance,<br>
><br>
> Guy<br>
><br>
><br>
><br>
> _______________________________________________<br>
> Ocfs2-devel mailing list<br>
> <a moz-do-not-send="true"
href="mailto:Ocfs2-devel@oss.oracle.com">Ocfs2-devel@oss.oracle.com</a><br>
> <a moz-do-not-send="true"
href="https://oss.oracle.com/mailman/listinfo/ocfs2-devel"
rel="noreferrer" target="_blank">https://oss.oracle.com/mailman/listinfo/ocfs2-devel</a><br>
><br>
<br>
</blockquote>
</div>
<br>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<br>
<pre wrap="">_______________________________________________
Ocfs2-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:Ocfs2-devel@oss.oracle.com">Ocfs2-devel@oss.oracle.com</a>
<a class="moz-txt-link-freetext" href="https://oss.oracle.com/mailman/listinfo/ocfs2-devel">https://oss.oracle.com/mailman/listinfo/ocfs2-devel</a></pre>
</blockquote>
<br>
</body>
</html>