<div dir="ltr"><div class="gmail_extra">hi list,</div><div class="gmail_extra"><br></div><div class="gmail_extra">I have a ocfs2 filesystem setup as a shared filesystem between 12 openstack compute nodes which are Ubuntu <span style="font-size:12.8px">16.04.3.</span></div><div class="gmail_extra">I have a very big concern of stability.</div><div class="gmail_extra">A month ago I lost a good deal of files, I don&#39;t know the real reason, but things seemed to point to the ofcs2 cluster.</div><div class="gmail_extra">Last week I found many of my compute nodes with the nova service down. The node which went down first has a &quot;stuck&quot; file/directory in the ocfs2 filesystem</div><div class="gmail_extra"><br></div><div class="gmail_extra"><div class="gmail_extra">root@node-99:/mnt/MSA_FC_Vol1/<wbr>nodes/cb5c94d0-ed4f-457d-88b0-<wbr>17d49eb7006a# ls</div><div class="gmail_extra"><br></div><div class="gmail_extra">directory in the command above has vHD files in it. Running a simple &#39;ls&#39; command hangs indefinitely (i&#39;ve left it hung 5 days now, it is stuck, never completes). at the end of this email I&#39;ve pasted the end of the dmesg output.....</div><div class="gmail_extra"><br></div><div class="gmail_extra">I ran fsck.ocfs2 on the filesystem and it did fix somethings. but running &#39;ls&#39; again in that directory still becomes stuck, and the nova service still comes down on all nodes.<br></div><div class="gmail_extra"><br></div><div class="gmail_extra">when I restart nova services on all these node they come down again after some time. when I stop ocfs2 on all these nodes they no longer come down.</div></div><div class="gmail_extra"><br></div><div class="gmail_extra">I have other openstack compute nodes that are identical except they use local storage and do not use ocfs2 and these have always been stable.<br></div><div class="gmail_extra"><br></div><div class="gmail_extra">maybe ocfs2 just isn&#39;t stable on Ubuntu <span style="font-size:12.8px">16.04.3? I am using version </span><span style="font-size:12.8px">1.6.4-3.1</span><br></div><div class="gmail_extra"><span style="font-size:12.8px"><br></span></div><div class="gmail_extra">any advice or comments would be appreciated!!!</div><div class="gmail_extra"><br></div><div class="gmail_extra"><div class="gmail_extra">[Thu Dec 21 20:22:35 2017] INFO: task ls:11052 blocked for more than 120 seconds.</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017]       Not tainted 4.4.0-98-generic #121-Ubuntu</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017] &quot;echo 0 &gt; /proc/sys/kernel/hung_task_timeout_secs&quot; disables this message.</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017] ls              D ffff880074c7b8d8     0 11052      1 0x00000004</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017]  ffff880074c7b8d8 ffff882016f1d400 ffff882038f70e00 ffff880035a8c600</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017]  ffff880074c7c000 ffff880074c7ba80 ffff880074c7ba78 ffff880035a8c600</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017]  0000000000000000 ffff880074c7b8f0 ffffffff81840585 7fffffffffffffff</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017] Call Trace:</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017]  [&lt;ffffffff81840585&gt;] schedule+0x35/0x80</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017]  [&lt;ffffffff818436d5&gt;] schedule_timeout+0x1b5/0x270</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017]  [&lt;ffffffff81840fe3&gt;] wait_for_completion+0xb3/0x140</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017]  [&lt;ffffffff810ac630&gt;] ? wake_up_q+0x70/0x70</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017]  [&lt;ffffffffc0779145&gt;] __ocfs2_cluster_lock.isra.34+0x415/0x750 [ocfs2]</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017]  [&lt;ffffffff810f634b&gt;] ? ktime_get+0x3b/0xb0</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017]  [&lt;ffffffffc077a20a&gt;] ocfs2_inode_lock_full_nested+0x16a/0x920 [ocfs2]</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017]  [&lt;ffffffffc0786ee9&gt;] ocfs2_iget+0x499/0x6c0 [ocfs2]</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017]  [&lt;ffffffffc0770ab8&gt;] ? ocfs2_free_dir_lookup_result+0x28/0x50 [ocfs2]</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017]  [&lt;ffffffffc077264e&gt;] ? ocfs2_lookup_ino_from_name+0x4e/0x70 [ocfs2]</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017]  [&lt;ffffffffc0796ff5&gt;] ocfs2_lookup+0x145/0x2f0 [ocfs2]</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017]  [&lt;ffffffff8121a54d&gt;] lookup_real+0x1d/0x60</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017]  [&lt;ffffffff8121be42&gt;] __lookup_hash+0x42/0x60</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017]  [&lt;ffffffff8121d1d6&gt;] walk_component+0x226/0x300</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017]  [&lt;ffffffffc0774e33&gt;] ? ocfs2_should_refresh_lock_res+0x113/0x160 [ocfs2]</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017]  [&lt;ffffffff8121eb1d&gt;] path_lookupat+0x5d/0x110</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017]  [&lt;ffffffff81220761&gt;] filename_lookup+0xb1/0x180</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017]  [&lt;ffffffffc07855d3&gt;] ? ocfs2_inode_revalidate+0x93/0x180 [ocfs2]</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017]  [&lt;ffffffff811eebb7&gt;] ? kmem_cache_alloc+0x187/0x1f0</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017]  [&lt;ffffffff81220366&gt;] ? getname_flags+0x56/0x1f0</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017]  [&lt;ffffffff81220906&gt;] user_path_at_empty+0x36/0x40</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017]  [&lt;ffffffff81215616&gt;] vfs_fstatat+0x66/0xc0</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017]  [&lt;ffffffff81215bd1&gt;] SYSC_newlstat+0x31/0x60</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017]  [&lt;ffffffff81215d0e&gt;] SyS_newlstat+0xe/0x10</div><div class="gmail_extra">[Thu Dec 21 20:22:35 2017]  [&lt;ffffffff818446b2&gt;] entry_SYSCALL_64_fastpath+0x16/0x71</div><div class="gmail_extra">[Thu Dec 21 20:33:10 2017] perf interrupt took too long (7066 &gt; 5000), lowering kernel.perf_event_max_sample_rate to 25000</div><div class="gmail_extra">[Thu Dec 21 22:05:02 2017] perf interrupt took too long (10271 &gt; 10000), lowering kernel.perf_event_max_sample_rate to 12500</div><div class="gmail_extra">[Fri Dec 22 00:00:01 2017] Process accounting resumed</div><div class="gmail_extra">[Fri Dec 22 00:10:25 2017] perf interrupt took too long (20273 &gt; 20000), lowering kernel.perf_event_max_sample_rate to 6250</div><div class="gmail_extra">[Fri Dec 22 07:19:10 2017] perf interrupt took too long (41761 &gt; 38461), lowering kernel.perf_event_max_sample_rate to 3250</div><div class="gmail_extra">[Fri Dec 22 23:59:58 2017] Process accounting resumed</div><div class="gmail_extra">[Sat Dec 23 07:19:11 2017] perf interrupt took too long (76936 &gt; 71428), lowering kernel.perf_event_max_sample_rate to 1750</div><div class="gmail_extra">[Sat Dec 23 23:59:56 2017] Process accounting resumed</div><div class="gmail_extra"><br></div><div><br></div></div></div>