<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 10pt;
font-family:Tahoma
}
--></style></head>
<body class='hmmessage'><div dir='ltr'>
Hello Herbert,<div><br></div><div>Thanks for your help!</div><div><br></div><div>Here is the process stack I get when a "dd" process is hanging:</div><div><br></div><blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;"><div><div><font face="Courier New" size="2">[<ffffffff81199f56>] __find_get_block_slow+0xc6/0x150</font></div></div><div><div><font face="Courier New" size="2">[<ffffffffa070d8e9>] ocfs2_metadata_cache_unlock+0x19/0x30 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2">[<ffffffffa070dcd7>] ocfs2_buffer_cached+0xa7/0x1a0 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2">[<ffffffffa070e7ec>] ocfs2_set_buffer_uptodate+0x2c/0x100 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2">[<ffffffffffffffff>] 0xffffffffffffffff</font></div></div></blockquote><div><br></div><div><br></div><div>Regarding the full stack dump, I found relevant lines by comparing full stack dumps while the system was frozen and not frozen. Here are the diff that were identified.</div><div><br></div><blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;"><div><div><font face="Courier New" size="2">kworker/12:1 S ffff881fc9af65e0 0 90 2 0x00000000</font></div></div><div><div><font face="Courier New" size="2"> ffff881fc9af9e40 0000000000000046 ffff881fc9af9de0 ffffffff81057602</font></div></div><div><div><font face="Courier New" size="2"> 0000000000012200 ffff881fc9af9fd8 ffff881fc9af8010 0000000000012200</font></div></div><div><div><font face="Courier New" size="2"> ffff881fc9af9fd8 0000000000012200 ffff881982b4e680 ffff881fc9af6040</font></div></div><div><div><font face="Courier New" size="2">Call Trace:</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffff81057602>] ? complete+0x52/0x60</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffff81085887>] ? move_linked_works+0x67/0x90</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffff81503eaf>] schedule+0x3f/0x60</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffff81088a1e>] worker_thread+0x24e/0x3c0</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffff810887d0>] ? manage_workers+0x120/0x120</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffff8108d546>] kthread+0x96/0xa0</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffff8150f304>] kernel_thread_helper+0x4/0x10</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffff8108d4b0>] ? kthread_worker_fn+0x1a0/0x1a0</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffff8150f300>] ? gs_change+0x13/0x13</font></div></div></blockquote><div><br></div><div><br></div><blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;"><div><div><font face="Courier New" size="2">dd R running task 0 25826 14680 0x00000080</font></div></div><div><div><font face="Courier New" size="2"> ffff881fc4de7388 ffffffff8110c65e ffff881fc4de7388 ffffffff81505e7e</font></div></div><div><div><font face="Courier New" size="2"> ffff881fc4de73d8 ffffffff81199f56 ffff881fc4de6010 0000000000012200</font></div></div><div><div><font face="Courier New" size="2"> ffff881fc4de7fd8 0000000000000000 ffff881fc4de73c8 ffffffff81505e7e</font></div></div><div><div><font face="Courier New" size="2">Call Trace:</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffff8110c65e>] ? find_get_page+0x1e/0xa0</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffff81505e7e>] ? _raw_spin_lock+0xe/0x20</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffff81199f56>] __find_get_block_slow+0xc6/0x150</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffff81505e7e>] ? _raw_spin_lock+0xe/0x20</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa06be462>] ? ocfs2_inode_cache_unlock+0x12/0x20 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa070d8e9>] ocfs2_metadata_cache_unlock+0x19/0x30 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa070dcd7>] ocfs2_buffer_cached+0xa7/0x1a0 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa070e7ec>] ocfs2_set_buffer_uptodate+0x2c/0x100 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa06be482>] ? ocfs2_inode_cache_io_unlock+0x12/0x20 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa06e9a85>] ? ocfs2_block_group_find_clear_bits+0xf5/0x180 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa06ea568>] ? ocfs2_cluster_group_search+0xa8/0x230 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa06eac13>] ? ocfs2_read_group_descriptor+0x73/0xb0 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa06ebe20>] ? ocfs2_search_chain+0x100/0x730 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa06ec7ec>] ? ocfs2_claim_suballoc_bits+0x39c/0x570 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa01b16ff>] ? do_get_write_access+0x35f/0x600 [jbd2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa06eca69>] ? __ocfs2_claim_clusters+0xa9/0x340 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa070d8e9>] ? ocfs2_metadata_cache_unlock+0x19/0x30 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa06ecd1d>] ? ocfs2_claim_clusters+0x1d/0x20 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa06cafdf>] ? ocfs2_local_alloc_new_window+0x6f/0x340 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa06cb413>] ? ocfs2_local_alloc_slide_window+0x163/0x5c0 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa06cb9c7>] ? ocfs2_reserve_local_alloc_bits+0x157/0x340 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa06ca2b8>] ? ocfs2_alloc_should_use_local+0x68/0xd0 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa06ee475>] ? ocfs2_reserve_clusters_with_limit+0xb5/0x320 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa06ef3a8>] ? ocfs2_reserve_clusters+0x18/0x20 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa06efdce>] ? ocfs2_lock_allocators+0x1fe/0x2b0 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa069d7a3>] ? ocfs2_write_begin_nolock+0x913/0x1100 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa06be482>] ? ocfs2_inode_cache_io_unlock+0x12/0x20 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa070d959>] ? ocfs2_metadata_cache_io_unlock+0x19/0x30 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa069f628>] ? ocfs2_read_blocks+0x2f8/0x6c0 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa06c7460>] ? ocfs2_journal_access_eb+0x20/0x20 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa069e086>] ? ocfs2_write_begin+0xf6/0x220 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffff8110be43>] ? generic_perform_write+0xc3/0x1c0</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa06b94ad>] ? ocfs2_prepare_inode_for_write+0x10d/0x710 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffff8110bf6b>] ? generic_file_buffered_write_iter+0x2b/0x60</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa06ba677>] ? ocfs2_file_write_iter+0x367/0x9b0 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffffa06bad48>] ? ocfs2_file_aio_write+0x88/0xa0 [ocfs2]</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffff8116c7c2>] ? do_sync_write+0xe2/0x120</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffff811feae3>] ? security_file_permission+0x23/0x90</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffff8116cd58>] ? vfs_write+0xc8/0x190</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffff8116cf21>] ? sys_write+0x51/0x90</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffff810cc1bb>] ? audit_syscall_exit+0x25b/0x290</font></div></div><div><div><font face="Courier New" size="2"> [<ffffffff8150e1c2>] ? system_call_fastpath+0x16/0x1b</font></div></div></blockquote><div><br></div><div><br></div><div>Thanks,</div><div><br></div><div>Jeff</div><div><br></div><div><br><div><div id="SkyDrivePlaceholder"></div><hr id="stopSpelling">Date: Thu, 25 Oct 2012 18:39:31 -0700<br>From: herbert.van.den.bergh@oracle.com<br>To: jpaterson23@hotmail.com<br>CC: ocfs2-users@oss.oracle.com<br>Subject: Re: [Ocfs2-users] OCFS2 hanging on writes<br><br>
Hello Jeff,<br>
<br>
You might want to check what the writer process is waiting on when
it's frozen. The wchan column of ps might be enough, but if not,
then perhaps a kernel stack trace of the process from
/proc/<pid>/stack or from echo t > /proc/sysrq-trigger .
The latter will show other blocked processes as well, which may be
helpful in determining the cause of the freeze.<br>
<br>
Thanks,<br>
Herbert.<br>
<br>
<br>
On 10/25/2012 06:32 PM, Jeff Paterson wrote:
<blockquote cite="mid:SNT127-W644D6054FF24D3CB189245A47E0@phx.gbl">
<style><!--
.ExternalClass .ecxhmmessage P
{padding:0px;}
.ExternalClass body.ecxhmmessage
{font-size:10pt;font-family:Tahoma;}
--></style>
<div dir="ltr">
<font size="2"><span style="white-space:nowrap;color:rgb(34,\00000d\00000a 34, 34);font-family:arial, sans-serif">Hello,</span><br>
</font>
<div>
<div dir="ltr">
<div><font color="#222222" face="arial, sans-serif" size="2"><span style="white-space:nowrap"><br>
</span></font></div>
<div><font color="#222222" face="arial, sans-serif" size="2"><span style="white-space:nowrap">I would need help with our
OCFS2 (1.8.0) filesystem. We are having problems with
it since a couple days. When we write onto it, it
hangs.</span></font></div>
<div><font color="#222222" face="arial, sans-serif" size="2"><span style="white-space:nowrap"><br>
</span></font></div>
<div><font color="#222222" face="arial, sans-serif" size="2"><span style="white-space:nowrap">The "hanging pattern" is
easily reproductible. If I write a 1GB file on the
filesystem, it does the following:</span></font></div>
<div><font color="#222222" face="arial, sans-serif" size="2"><span style="white-space:nowrap"> - write ~200 MB of
data on the disk in 1 second</span></font></div>
<div><font color="#222222" face="arial, sans-serif" size="2"><span style="white-space:nowrap"> - freeze for about
10 seconds</span></font></div>
<div><font color="#222222" face="arial, sans-serif" size="2"><span style="white-space:nowrap"> - write ~200 MB of
data on the disk in 1 second</span></font></div>
<div><font color="#222222" face="arial, sans-serif" size="2"><span style="white-space:nowrap"> - freeze for about
10 seconds</span></font></div>
<div><font color="#222222" face="arial, sans-serif" size="2"><span style="white-space:nowrap"> - write ~200 MB of
data on the disk in 1 second</span></font></div>
<div><font color="#222222" face="arial, sans-serif" size="2"><span style="white-space:nowrap"> - freeze for about
10 seconds</span></font></div>
<div><font color="#222222" face="arial, sans-serif" size="2"><span style="white-space:nowrap"> (and so on)</span></font></div>
<div><font color="#222222" face="arial, sans-serif" size="2"><span style="white-space:nowrap"><br>
</span></font></div>
<div><font color="#222222" face="arial, sans-serif" size="2"><span style="white-space:nowrap">When the freezes occur:</span></font></div>
<div><font color="#222222" face="arial, sans-serif" size="2"><span style="white-space:nowrap"> - other writes
operations (from other processes) on the same node
also freeze</span></font></div>
<div><font color="#222222" face="arial, sans-serif" size="2"><span style="white-space:nowrap"> - writes operations
on other nodes are not affected by the freezes on
another node</span></font></div>
<div><font color="#222222" face="arial, sans-serif" size="2"><span style="white-space:nowrap"> </span></font></div>
<div><font size="2"><font color="#222222" face="arial,
sans-serif"><span style="white-space:nowrap">Read
operations (on any cluster node, even the one with
frozen writes) don't seem to be affected by the
freezes. One sure thing, read operations alone d</span></font><span style="white-space:nowrap;color:rgb(34, 34,\00000d\00000a 34);font-family:arial, sans-serif">on't cause the
filesystem freeze.</span></font></div>
<div><font color="#222222" face="arial, sans-serif" size="2"><span style="white-space:nowrap"><br>
</span></font></div>
<div><font color="#222222" face="arial, sans-serif" size="2"><span style="white-space:nowrap">
<div>For info, before the problem began to appear we
could sustain 640 MB/s writes without any freeze.</div>
<div><br>
</div>
<div>I tried to mount the filesystem on a single node
to avoid issues that could happen with inter-node
communications and the problem was still there.</div>
<div><br>
</div>
<div><br>
</div>
<div><b><u>Filesystem details</u></b></div>
<div>
<ul>
<li>The filesystem has 18 TB and it is currently
72% full.</li>
<li>Mount options are the following:
rw,nodev,_netdev,noatime,errors=panic,data=writeback,noacl,nouser_xattr,commit=60,heartbeat=local</li>
<li>All Features: backup-super
strict-journal-super sparse extended-slotmap
inline-data metaecc indexed-dirs refcount
discontig-bg unwritten</li>
</ul>
</div>
<div><br>
</div>
<div><br>
</div>
<div>There is nothing special in the systems logs
beside application errors caused by the freezes.</div>
<div><br>
</div>
<div><br>
</div>
<div>Would a fsck.ocfs2 help? How long would it take
for 18 TB?</div>
<div><br>
</div>
<div>Is there a flag I can enable in debugfs.ocfs2 to
get a better idea of what is happening and why it is
freezing like that?</div>
<div><br>
</div>
<div><br>
</div>
<div>Any help would be greatly appreciated.</div>
<div><br>
</div>
<div>Thanks in advance,</div>
<div><br>
</div>
<div>Jeff</div>
</span></font></div>
</div>
</div>
<style><!--
.ExternalClass .ecxhmmessage P
{padding:0px;}
.ExternalClass body.ecxhmmessage
{font-size:10pt;font-family:Tahoma;}
--></style> </div>
<br>
<fieldset class="ecxmimeAttachmentHeader"></fieldset>
<br>
<pre>_______________________________________________
Ocfs2-users mailing list
<a class="ecxmoz-txt-link-abbreviated" href="mailto:Ocfs2-users@oss.oracle.com">Ocfs2-users@oss.oracle.com</a>
<a class="ecxmoz-txt-link-freetext" href="https://oss.oracle.com/mailman/listinfo/ocfs2-users" target="_blank">https://oss.oracle.com/mailman/listinfo/ocfs2-users</a></pre>
</blockquote></div></div>                                            </div></body>
</html>