[Ocfs2-devel] A deadlock when system do not has sufficient memory

Sunil Mushran sunil.mushran at gmail.com
Wed Aug 27 18:16:23 PDT 2014


Hi,

What is o2net_wq waiting on that is preventing it from processing the reply
for unlock?
Sorry for being a bit slow. Do you have the raw stacks of the deadlock?

I mean, making that alloc NOFS should not be an issue. But I would like to
understand whether we are fixing the actual problem or not.

Sunil


On Tue, Aug 26, 2014 at 6:57 PM, Xue jiufei <xuejiufei at huawei.com> wrote:

> Hi, Sunil
> On 2014/8/26 1:13, Sunil Mushran wrote:
> > On Sun, Aug 24, 2014 at 11:05 PM, Joseph Qi <joseph.qi at huawei.com
> <mailto:joseph.qi at huawei.com>> wrote:
> >
> >     On 2014/8/25 13:45, Sunil Mushran wrote:
> >     > Please could you expand on that.
> >     >
> >     In our scenario, one node can mount multiple volumes across the
> >     cluster.
> >     For instance, N1 has mounted ocfs2 volumes say volume1, volume2,
> >     volume3. And volume3 may do umount/mount during runtime of other
> >     volumes.
> >
> >
> > I meant expand on the deadlock. Say we are mounting a new volume and
> that triggers a inode cleanup. That inode being cleaned up will have to be
> from one of the mounted volumes. How can this lead to a deadlock?
> >
> > Two variations:
> > a) Node death leading to recovery during the mount.
> > b) Mount atop a mount.
> >
> > But I cannot still see a deadlock in either scenario.
> The deadlock situation is just the same as the I described in my first
> mail.
> o2net_wq
> -> dlm_query_region_handler
> -> kmalloc(no sufficient memory)
> -> triggers ocfs2 inodes cleanup
> -> ocfs2_drop_lock
> -> call o2net_send_message to send unlock message
> -> wait_event(nsw.ns_wq, o2net_nsw_completed(nn, &nsw))
>    to wait for the reply from master
> -> tcp layer receive the reply, call o2net_data_ready
> -> queue sc_rx_work, but o2net_wq cannot handle this work
> so it triggers the deadlock, o2net_wq is waiting itself to
> handle unlock reply and complete the nsw.
>
> Thanks.
> Xuejiufei
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-devel/attachments/20140827/0df968ca/attachment.html 


More information about the Ocfs2-devel mailing list