<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hi,<br>
<br>
<div class="moz-cite-prefix">On 10/22/15 21:00, gjprabu wrote:<br>
</div>
<blockquote
cite="mid:1508fa2e901.12bbf98f31129.7910549554411585115@zohocorp.com"
type="cite">
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
<div style="font-size:10pt;">
<div>Hi Eric,<br>
</div>
<div><br>
</div>
<div><span class="font" style="font-family: arial, helvetica,
sans-serif, sans-serif;">Thanks for your reply, Still we are
facing same issue. we found this dmesg logs and this is
known logs because our self made down node1 and made up this
is showing in logs and other then we didn't found error
message. </span><span class="highlight"
style="background-color: rgb(255, 255, 255)"><span
class="size" style="font-size:13.3333px"><span
class="font" style="font-family: arial, helvetica,
sans-serif, sans-serif;">Even we do have problem while
unmounting. umount process goes to "D" stat and </span></span></span><span
class="font" style="font-family: arial, helvetica,
sans-serif, sans-serif;"> fsck through fsck.ocfs2: I/O
error. If required to run any other command pls let me
know. </span><br>
</div>
<div><br>
</div>
</div>
</blockquote>
1. system log over boots<br>
#journalctl --list-boots<br>
If there is just one boot record, please " man journald.conf" to
configure saving system logs over boots.<br>
so, you can use "journalctl -b xxx" to see any specific boot system
log.<br>
<br>
I can't see what steps exactly lead to that error message? Better to
tidy up your problems from clean state.<br>
<br>
2. umount issue may be caused by the bad condition cluster.
Communication between nodes hung up.<br>
<br>
3. please using device instead of mount point.<br>
<br>
4. Did you build up CEPH RBD based on a good conditional ocfs2
cluster? It's better test more if cluster is<br>
good before working on it.<br>
<br>
<br>
Thanks,<br>
Eric <br>
<b></b>
<blockquote
cite="mid:1508fa2e901.12bbf98f31129.7910549554411585115@zohocorp.com"
type="cite">
<div style="font-size:10pt;">
<div><b>ocfs2 version</b></div>
<div>debugfs.ocfs2 1.8.0<br>
</div>
<div><br>
</div>
<div><b># cat /etc/sysconfig/o2cb</b><br>
</div>
<div>#<br>
</div>
<div># This is a configuration file for automatic startup of the
O2CB<br>
</div>
<div># driver. It is generated by running /etc/init.d/o2cb
configure.<br>
</div>
<div># On Debian based systems the preferred method is running<br>
</div>
<div># 'dpkg-reconfigure ocfs2-tools'.<br>
</div>
<div>#<br>
</div>
<div><br>
</div>
<div># O2CB_STACK: The name of the cluster stack backing O2CB.<br>
</div>
<div>O2CB_STACK=o2cb<br>
</div>
<div><br>
</div>
<div># O2CB_BOOTCLUSTER: If not empty, the name of a cluster to
start.<br>
</div>
<div>O2CB_BOOTCLUSTER=ocfs2<br>
</div>
<div><br>
</div>
<div># O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is
considered dead.<br>
</div>
<div>O2CB_HEARTBEAT_THRESHOLD=31<br>
</div>
<div><br>
</div>
<div># O2CB_IDLE_TIMEOUT_MS: Time in ms before a network
connection is considered dead.<br>
</div>
<div>O2CB_IDLE_TIMEOUT_MS=30000<br>
</div>
<div><br>
</div>
<div># O2CB_KEEPALIVE_DELAY_MS: Max time in ms before a
keepalive packet is sent<br>
</div>
<div>O2CB_KEEPALIVE_DELAY_MS=2000<br>
</div>
<div><br>
</div>
<div># O2CB_RECONNECT_DELAY_MS: Min time in ms between
connection attempts<br>
</div>
<div>O2CB_RECONNECT_DELAY_MS=2000<br>
</div>
<div><br>
</div>
<div><b># fsck.ocfs2 -fy /home/build/downloads/</b><br>
</div>
<div>fsck.ocfs2 1.8.0<br>
</div>
<div>fsck.ocfs2: I/O error on channel while opening
"/zoho/build/downloads/"<br>
</div>
<div><br>
</div>
<div><u><b>dmesg logs</b></u></div>
<div> <br>
</div>
<div>[ 4229.886284] o2dlm: Joining domain
A895BC216BE641A8A7E20AA89D57E051 ( 5 ) 1 nodes<br>
</div>
<div>[ 4251.437451] o2dlm: Node 3 joins domain
A895BC216BE641A8A7E20AA89D57E051 ( 3 5 ) 2 nodes<br>
</div>
<div>[ 4267.836392] o2dlm: Node 1 joins domain
A895BC216BE641A8A7E20AA89D57E051 ( 1 3 5 ) 3 nodes<br>
</div>
<div>[ 4292.755589] o2dlm: Node 2 joins domain
A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 5 ) 4 nodes<br>
</div>
<div>[ 4306.262165] o2dlm: Node 4 joins domain
A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 5 ) 5 nodes<br>
</div>
<div>[316476.505401]
(kworker/u192:0,95923,0):dlm_do_assert_master:1717 ERROR:
Error -112 when sending message 502 (key 0xc3460ae7) to node 1<br>
</div>
<div>[316476.505470] o2cb: o2dlm has evicted node 1 from domain
A895BC216BE641A8A7E20AA89D57E051<br>
</div>
<div>[316480.437231] o2dlm: Begin recovery on domain
A895BC216BE641A8A7E20AA89D57E051 for node 1<br>
</div>
<div>[316480.442389] o2cb: o2dlm has evicted node 1 from domain
A895BC216BE641A8A7E20AA89D57E051<br>
</div>
<div>[316480.442412]
(kworker/u192:0,95923,20):dlm_begin_reco_handler:2765
A895BC216BE641A8A7E20AA89D57E051: dead_node previously set to
1, node 3 changing it to 1<br>
</div>
<div>[316480.541237] o2dlm: Node 3 (he) is the Recovery Master
for the dead node 1 in domain A895BC216BE641A8A7E20AA89D57E051<br>
</div>
<div>[316480.541241] o2dlm: End recovery on domain
A895BC216BE641A8A7E20AA89D57E051<br>
</div>
<div>[316485.542733] o2dlm: Begin recovery on domain
A895BC216BE641A8A7E20AA89D57E051 for node 1<br>
</div>
<div>[316485.542740] o2dlm: Node 3 (he) is the Recovery Master
for the dead node 1 in domain A895BC216BE641A8A7E20AA89D57E051<br>
</div>
<div>[316485.542742] o2dlm: End recovery on domain
A895BC216BE641A8A7E20AA89D57E051<br>
</div>
<div>[316490.544535] o2dlm: Begin recovery on domain
A895BC216BE641A8A7E20AA89D57E051 for node 1<br>
</div>
<div>[316490.544538] o2dlm: Node 3 (he) is the Recovery Master
for the dead node 1 in domain A895BC216BE641A8A7E20AA89D57E051<br>
</div>
<div>[316490.544539] o2dlm: End recovery on domain
A895BC216BE641A8A7E20AA89D57E051<br>
</div>
<div>[316495.546356] o2dlm: Begin recovery on domain
A895BC216BE641A8A7E20AA89D57E051 for node 1<br>
</div>
<div>[316495.546362] o2dlm: Node 3 (he) is the Recovery Master
for the dead node 1 in domain A895BC216BE641A8A7E20AA89D57E051<br>
</div>
<div>[316495.546364] o2dlm: End recovery on domain
A895BC216BE641A8A7E20AA89D57E051<br>
</div>
<div>[316500.548135] o2dlm: Begin recovery on domain
A895BC216BE641A8A7E20AA89D57E051 for node 1<br>
</div>
<div>[316500.548139] o2dlm: Node 3 (he) is the Recovery Master
for the dead node 1 in domain A895BC216BE641A8A7E20AA89D57E051<br>
</div>
<div>[316500.548140] o2dlm: End recovery on domain
A895BC216BE641A8A7E20AA89D57E051<br>
</div>
<div>[316505.549947] o2dlm: Begin recovery on domain
A895BC216BE641A8A7E20AA89D57E051 for node 1<br>
</div>
<div>[316505.549951] o2dlm: Node 3 (he) is the Recovery Master
for the dead node 1 in domain A895BC216BE641A8A7E20AA89D57E051<br>
</div>
<div>[316505.549952] o2dlm: End recovery on domain
A895BC216BE641A8A7E20AA89D57E051<br>
</div>
<div>[316510.551734] o2dlm: Begin recovery on domain
A895BC216BE641A8A7E20AA89D57E051 for node 1<br>
</div>
<div>[316510.551739] o2dlm: Node 3 (he) is the Recovery Master
for the dead node 1 in domain A895BC216BE641A8A7E20AA89D57E051<br>
</div>
<div>[316510.551740] o2dlm: End recovery on domain
A895BC216BE641A8A7E20AA89D57E051<br>
</div>
<div>[316515.553543] o2dlm: Begin recovery on domain
A895BC216BE641A8A7E20AA89D57E051 for node 1<br>
</div>
<div>[316515.553547] o2dlm: Node 3 (he) is the Recovery Master
for the dead node 1 in domain A895BC216BE641A8A7E20AA89D57E051<br>
</div>
<div>[316515.553548] o2dlm: End recovery on domain
A895BC216BE641A8A7E20AA89D57E051<br>
</div>
<div>[316520.555337] o2dlm: Begin recovery on domain
A895BC216BE641A8A7E20AA89D57E051 for node 1<br>
</div>
<div>[316520.555341] o2dlm: Node 3 (he) is the Recovery Master
for the dead node 1 in domain A895BC216BE641A8A7E20AA89D57E051<br>
</div>
<div>[316520.555343] o2dlm: End recovery on domain
A895BC216BE641A8A7E20AA89D57E051<br>
</div>
<div>[316525.557131] o2dlm: Begin recovery on domain
A895BC216BE641A8A7E20AA89D57E051 for node 1<br>
</div>
<div>[316525.557136] o2dlm: Node 3 (he) is the Recovery Master
for the dead node 1 in domain A895BC216BE641A8A7E20AA89D57E051<br>
</div>
<div>[316525.557153] o2dlm: End recovery on domain
A895BC216BE641A8A7E20AA89D57E051<br>
</div>
<div>[316530.558952] o2dlm: Begin recovery on domain
A895BC216BE641A8A7E20AA89D57E051 for node 1<br>
</div>
<div>[316530.558955] o2dlm: Node 3 (he) is the Recovery Master
for the dead node 1 in domain A895BC216BE641A8A7E20AA89D57E051<br>
</div>
<div>[316530.558957] o2dlm: End recovery on domain
A895BC216BE641A8A7E20AA89D57E051<br>
</div>
<div>[316535.560781] o2dlm: Begin recovery on domain
A895BC216BE641A8A7E20AA89D57E051 for node 1<br>
</div>
<div>[316535.560789] o2dlm: Node 3 (he) is the Recovery Master
for the dead node 1 in domain A895BC216BE641A8A7E20AA89D57E051<br>
</div>
<div>[316535.560792] o2dlm: End recovery on domain
A895BC216BE641A8A7E20AA89D57E051<br>
</div>
<div>[319419.525609] o2dlm: Node 1 joins domain
A895BC216BE641A8A7E20AA89D57E051 ( 1 2 3 4 5 ) 5 nodes<br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><b>ps -auxxxxx | grep umount</b><br>
</div>
<div>root 32083 21.8 0.0 125620 2828 pts/14 D+ 19:37
0:18 umount /home/build/repository<br>
</div>
<div>root 32196 0.0 0.0 112652 2264 pts/8 S+ 19:38
0:00 grep --color=auto umount<br>
</div>
<div><br>
</div>
<div><br>
</div>
<div><b>cat /proc/32083/stack</b> <br>
</div>
<div>[<ffffffff8132ad7d>]
o2net_send_message_vec+0x71d/0xb00<br>
</div>
<div>[<ffffffff81352148>]
dlm_send_remote_unlock_request.isra.2+0x128/0x410<br>
</div>
<div>[<ffffffff813527db>] dlmunlock_common+0x3ab/0x9e0<br>
</div>
<div>[<ffffffff81353088>] dlmunlock+0x278/0x800<br>
</div>
<div>[<ffffffff8131f765>] o2cb_dlm_unlock+0x35/0x50<br>
</div>
<div>[<ffffffff8131ecfe>] ocfs2_dlm_unlock+0x1e/0x30<br>
</div>
<div>[<ffffffff812a8776>]
ocfs2_drop_lock.isra.29.part.30+0x1f6/0x700<br>
</div>
<div>[<ffffffff812ae40d>]
ocfs2_simple_drop_lockres+0x2d/0x40<br>
</div>
<div>[<ffffffff8129b43c>] ocfs2_dentry_lock_put+0x5c/0x80<br>
</div>
<div>[<ffffffff8129b4a2>] ocfs2_dentry_iput+0x42/0x1d0<br>
</div>
<div>[<ffffffff81204dc2>] __dentry_kill+0x102/0x1f0<br>
</div>
<div>[<ffffffff81205294>] shrink_dentry_list+0xe4/0x2a0<br>
</div>
<div>[<ffffffff81205aa8>] shrink_dcache_parent+0x38/0x90<br>
</div>
<div>[<ffffffff81205b16>] do_one_tree+0x16/0x50<br>
</div>
<div>[<ffffffff81206e9f>]
shrink_dcache_for_umount+0x2f/0x90<br>
</div>
<div>[<ffffffff811efb15>]
generic_shutdown_super+0x25/0x100<br>
</div>
<div>[<ffffffff811eff57>] kill_block_super+0x27/0x70<br>
</div>
<div>[<ffffffff811f02a9>]
deactivate_locked_super+0x49/0x60<br>
</div>
<div>[<ffffffff811f089e>] deactivate_super+0x4e/0x70<br>
</div>
<div>[<ffffffff8120da83>] cleanup_mnt+0x43/0x90<br>
</div>
<div>[<ffffffff8120db22>] __cleanup_mnt+0x12/0x20<br>
</div>
<div>[<ffffffff81093ba4>] task_work_run+0xc4/0xe0<br>
</div>
<div>[<ffffffff81013c67>] do_notify_resume+0x97/0xb0<br>
</div>
<div>[<ffffffff817d2ee7>] int_signal+0x12/0x17<br>
</div>
<div>[<ffffffffffffffff>] 0xffffffffffffffff<br>
</div>
<div><br>
</div>
<div id="">
<div><span class="colour" style="color:rgb(0, 0, 0)">Regards</span><br>
</div>
<div>Prabu</div>
<div style="color: rgb(255, 0, 0);"><br>
</div>
<div><span class="size" style="font-size:16px"><span
class="colour" style="color:rgb(192, 192, 192)"><span
class="font" style="font-family:arial, helvetica,
sans-serif"><span class="font"
style="font-family:'courier new', courier,
monospace"><span class="size" style="font-size:24px"><span
class="colour" style="color:rgb(0, 0, 255)"></span></span></span><span></span></span></span></span><br>
</div>
</div>
<div><br>
</div>
<div class="zmail_extra">
<div id="1">
<div><br>
</div>
<div> ---- On Wed, 21 Oct 2015 08:32:15 +0530 <b>Eric Ren
<a class="moz-txt-link-rfc2396E" href="mailto:zren@suse.com"><zren@suse.com></a></b> wrote ----<br>
</div>
</div>
<div><br>
</div>
<blockquote style="border-left: 1px solid #cccccc;
padding-left: 6px; margin:0 0 0 5px">
<div>
<div>Hi Prabu,<br>
</div>
<div> <br>
</div>
<div> I guess others like me are not familiar with this
case that combine CEPH RBD and OCFS2.<br>
</div>
<div> <br>
</div>
<div> We'd really like to help you. But I think ocfs2
developers cannot get any info about what happened<br>
</div>
<div> to ocfs2 from your descriptions. <br>
</div>
<div> <br>
</div>
<div> So, I'm wondering if you can reproduce and tell us
the steps. Once developers can reproduce it,<br>
</div>
<div> it's likely be resolved;-) BTW, any dmesg log about
ocfs2 especially the initial error message and stack<br>
</div>
<div> back trace will be helpful!<br>
</div>
<div> <br>
</div>
<div> Thanks,<br>
</div>
<div> Eric<br>
</div>
<div> <br>
</div>
<div>On 10/20/15 17:29, gjprabu wrote:<br>
</div>
<div id="zmail_block"><br>
</div>
</div>
<blockquote>
<div style="font-size: 10.0pt;">
<div>
<div style="color: rgb(255,0,0);"><span class="colour"
style="color:rgb(0, 0, 0)">Hi </span><br>
</div>
<div style="color: rgb(255,0,0);"><br>
</div>
<div style="color: rgb(255,0,0);"><span class="colour"
style="color:rgb(0, 0, 0)"> We are looking
forward to your input on this.</span><br>
</div>
<div style="color: rgb(255,0,0);"><br>
</div>
<div style="color: rgb(255,0,0);"><span class="colour"
style="color:rgb(0, 0, 0)">Regads</span><br>
</div>
<div style="color: rgb(255,0,0);"><span class="colour"
style="color:rgb(0, 0, 0)">Prabu</span><br>
</div>
<div style="color: rgb(255,0,0);"><br>
</div>
</div>
<div class="zmail_extra">
<div>
<div>--- On Fri, 09 Oct 2015 12:08:19 +0530 <b>gjprabu
<a moz-do-not-send="true"
href="mailto:gjprabu@zohocorp.com"
target="_blank"><gjprabu@zohocorp.com></a></b>
wrote ----<br>
</div>
</div>
<div><br>
</div>
<blockquote style="border-left: 1.0px solid
rgb(204,204,204);padding-left: 6.0px;margin: 0 0 0
5.0px;">
<div>
<div style="font-size: 10.0pt;">
<div class="zmail_extra">
<div>
<div><br>
</div>
</div>
<div><br>
</div>
<div><br>
</div>
</div>
<div><br>
</div>
</div>
</div>
<blockquote style="border-left: 1.0px solid
rgb(204,204,204);padding-left: 6.0px;margin: 0 0 0
5.0px;">
<div>
<div style="font-size: 10.0pt;">
<div>Hi All,<br>
</div>
<div><br>
</div>
<div> Anybody pls help me on this
issue.<br>
</div>
<div><br>
</div>
<div>
<div><span class="colour"
style="color:rgb(0, 0, 0)">Regards</span><br>
</div>
<div>Prabu<br>
</div>
<div style="color: rgb(255,0,0);"><br>
</div>
<div><span class="size"
style="font-size:16px"><span
class="colour" style="color:rgb(192,
192, 192)"><span class="font"
style="font-family:arial, helvetica,
sans-serif"><span class="size"
style="font-size:24px"><span
class="colour"
style="color:rgb(0, 0, 255)"></span></span><span></span></span></span></span><br>
</div>
</div>
<div><br>
</div>
<div class="zmail_extra">
<div>
<div><br>
</div>
<div>---- On Thu, 08 Oct 2015 12:33:57
+0530 <b>gjprabu <<a
moz-do-not-send="true"
href="mailto:gjprabu@zohocorp.com"
target="_blank"><a class="moz-txt-link-abbreviated" href="mailto:gjprabu@zohocorp.com">gjprabu@zohocorp.com</a></a>></b>
wrote ----<br>
</div>
</div>
<div><br>
</div>
<div><br>
</div>
</div>
<div><br>
</div>
</div>
</div>
<blockquote style="border-left: 1.0px solid
rgb(204,204,204);padding-left: 6.0px;margin: 0 0
0 5.0px;">
<div>
<div style="font-size: 10.0pt;">
<div>Hi All, <br>
</div>
<div><br>
</div>
<div> We have CEPH RBD with OCFS2
mounted servers. we are facing i/o errors
simultaneously while move the data's in
the same disk (Copying is not having any
problem). Temporary we remount the
partition and the issue get resolved but
after sometime problem again reproduced.
If anybody faced same issue. Please help
us.<br>
</div>
<div><br>
</div>
<div>Note : We have total 5 Nodes, here two
nodes working fine other nodes are showing
like below input/output error.<br>
</div>
<div><br>
</div>
<div>ls -althr <br>
</div>
<div>ls: cannot access LITE_3_0_M4_1_TEST:
Input/output error <br>
</div>
<div>ls: cannot access LITE_3_0_M4_1_OLD:
Input/output error <br>
</div>
<div>total 0 <br>
</div>
<div>d????????? ? ? ? ? ?
LITE_3_0_M4_1_TEST <br>
</div>
<div>d????????? ? ? ? ? ? LITE_3_0_M4_1_OLD
<br>
</div>
<div><br>
</div>
<div>cluster:<br>
</div>
<div> node_count=5<br>
</div>
<div> heartbeat_mode = local<br>
</div>
<div> name=ocfs2<br>
</div>
<div><br>
</div>
<div>node:<br>
</div>
<div> ip_port = 7777<br>
</div>
<div> ip_address = 192.168.113.42<br>
</div>
<div> number = 1<br>
</div>
<div> name = integ-hm9<br>
</div>
<div> cluster = ocfs2<br>
</div>
<div><br>
</div>
<div>node:<br>
</div>
<div> ip_port = 7777<br>
</div>
<div> ip_address = 192.168.112.115<br>
</div>
<div> number = 2<br>
</div>
<div> name = integ-hm2<br>
</div>
<div> cluster = ocfs2<br>
</div>
<div><br>
</div>
<div>node:<br>
</div>
<div> ip_port = 7777<br>
</div>
<div> ip_address = 192.168.113.43<br>
</div>
<div> number = 3<br>
</div>
<div> name = integ-ci-1<br>
</div>
<div> cluster = ocfs2<br>
</div>
<div>node:<br>
</div>
<div> ip_port = 7777<br>
</div>
<div> ip_address = 192.168.112.217<br>
</div>
<div> number = 4<br>
</div>
<div> name = integ-hm8<br>
</div>
<div> cluster = ocfs2<br>
</div>
<div>node:<br>
</div>
<div> ip_port = 7777<br>
</div>
<div> ip_address = 192.168.112.192<br>
</div>
<div> number = 5<br>
</div>
<div> name = integ-hm5<br>
</div>
<div> cluster = ocfs2<br>
</div>
<div><br>
</div>
<div><br>
</div>
<div>
<div><span class="colour"
style="color:rgb(0, 0, 0)">Regards</span><br>
</div>
<div>Prabu<br>
</div>
<div style="color: rgb(255,0,0);"><br>
</div>
<div><span class="size"
style="font-size:16px"><span
class="colour" style="color:rgb(192,
192, 192)"><span class="font"
style="font-family:arial,
helvetica, sans-serif"><span
class="size"
style="font-size:24px"><span
class="colour"
style="color:rgb(0, 0, 255)"></span></span><span></span></span></span></span><br>
</div>
</div>
<div><br>
</div>
</div>
<div>_______________________________________________
<br>
</div>
<div>Ocfs2-users mailing list <br>
</div>
<div><a moz-do-not-send="true"
href="mailto:Ocfs2-users@oss.oracle.com"
target="_blank">Ocfs2-users@oss.oracle.com</a>
<br>
</div>
<div><a moz-do-not-send="true"
href="https://oss.oracle.com/mailman/listinfo/ocfs2-users"
target="_blank">https://oss.oracle.com/mailman/listinfo/ocfs2-users</a><br>
</div>
</div>
</blockquote>
</blockquote>
</blockquote>
</div>
<div><br>
</div>
</div>
<div><br>
</div>
<div> <br>
</div>
<pre>_______________________________________________
Ocfs2-users mailing list
<a moz-do-not-send="true" href="mailto:Ocfs2-users@oss.oracle.com" target="_blank">Ocfs2-users@oss.oracle.com</a> <a moz-do-not-send="true" href="https://oss.oracle.com/mailman/listinfo/ocfs2-users" target="_blank">https://oss.oracle.com/mailman/listinfo/ocfs2-users</a>
</pre>
</blockquote>
</blockquote>
</div>
<div><br>
</div>
</div>
</blockquote>
<br>
</body>
</html>