<div dir="ltr"><div>Hi,</div><div><br></div>What is o2net_wq waiting on that is preventing it from processing the reply for unlock?<div>Sorry for being a bit slow. Do you have the raw stacks of the deadlock?<div><br></div>


<div>I mean, making that alloc NOFS should not be an issue. But I would like to</div><div>understand whether we are fixing the actual problem or not.<br><div><br></div><div>Sunil</div></div></div></div><div class="gmail_extra">


<br><br><div class="gmail_quote">On Tue, Aug 26, 2014 at 6:57 PM, Xue jiufei <span dir="ltr">&lt;<a href="mailto:xuejiufei@huawei.com" target="_blank">xuejiufei@huawei.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


Hi, Sunil<br>

On 2014/8/26 1:13, Sunil Mushran wrote:<br>

<div><div class="h5">&gt; On Sun, Aug 24, 2014 at 11:05 PM, Joseph Qi &lt;<a href="mailto:joseph.qi@huawei.com">joseph.qi@huawei.com</a> &lt;mailto:<a href="mailto:joseph.qi@huawei.com">joseph.qi@huawei.com</a>&gt;&gt; wrote:<br>


&gt;<br>

&gt;     On 2014/8/25 13:45, Sunil Mushran wrote:<br>

&gt;     &gt; Please could you expand on that.<br>

&gt;     &gt;<br>

&gt;     In our scenario, one node can mount multiple volumes across the<br>

&gt;     cluster.<br>

&gt;     For instance, N1 has mounted ocfs2 volumes say volume1, volume2,<br>

&gt;     volume3. And volume3 may do umount/mount during runtime of other<br>

&gt;     volumes.<br>

&gt;<br>

&gt;<br>

&gt; I meant expand on the deadlock. Say we are mounting a new volume and that triggers a inode cleanup. That inode being cleaned up will have to be from one of the mounted volumes. How can this lead to a deadlock?<br>

&gt;<br>

&gt; Two variations:<br>

&gt; a) Node death leading to recovery during the mount.<br>

&gt; b) Mount atop a mount.<br>

&gt;<br>

&gt; But I cannot still see a deadlock in either scenario.<br>

</div></div>The deadlock situation is just the same as the I described in my first mail.<br>

o2net_wq<br>

-&gt; dlm_query_region_handler<br>

-&gt; kmalloc(no sufficient memory)<br>

-&gt; triggers ocfs2 inodes cleanup<br>

-&gt; ocfs2_drop_lock<br>

-&gt; call o2net_send_message to send unlock message<br>

-&gt; wait_event(nsw.ns_wq, o2net_nsw_completed(nn, &amp;nsw))<br>

   to wait for the reply from master<br>

-&gt; tcp layer receive the reply, call o2net_data_ready<br>

-&gt; queue sc_rx_work, but o2net_wq cannot handle this work<br>

so it triggers the deadlock, o2net_wq is waiting itself to<br>

handle unlock reply and complete the nsw.<br>

<br>

Thanks.<br>

Xuejiufei<br>

<br>

</blockquote></div><br></div>