[Ocfs2-devel] A deadlock when system do not has sufficient memory

Tue Aug 26 18:57:38 PDT 2014

Hi, Sunil
On 2014/8/26 1:13, Sunil Mushran wrote:
> On Sun, Aug 24, 2014 at 11:05 PM, Joseph Qi <joseph.qi at huawei.com <mailto:joseph.qi at huawei.com>> wrote:
> 
>     On 2014/8/25 13:45, Sunil Mushran wrote:
>     > Please could you expand on that.
>     >
>     In our scenario, one node can mount multiple volumes across the
>     cluster.
>     For instance, N1 has mounted ocfs2 volumes say volume1, volume2,
>     volume3. And volume3 may do umount/mount during runtime of other
>     volumes.
> 
> 
> I meant expand on the deadlock. Say we are mounting a new volume and that triggers a inode cleanup. That inode being cleaned up will have to be from one of the mounted volumes. How can this lead to a deadlock?
> 
> Two variations:
> a) Node death leading to recovery during the mount.
> b) Mount atop a mount.
> 
> But I cannot still see a deadlock in either scenario.
The deadlock situation is just the same as the I described in my first mail.
o2net_wq
-> dlm_query_region_handler
-> kmalloc(no sufficient memory)
-> triggers ocfs2 inodes cleanup
-> ocfs2_drop_lock
-> call o2net_send_message to send unlock message
-> wait_event(nsw.ns_wq, o2net_nsw_completed(nn, &nsw))
   to wait for the reply from master
-> tcp layer receive the reply, call o2net_data_ready
-> queue sc_rx_work, but o2net_wq cannot handle this work
so it triggers the deadlock, o2net_wq is waiting itself to
handle unlock reply and complete the nsw.

Thanks.
Xuejiufei