[Ocfs2-devel] [Question] deadlock on chmod when running discontigous block group multiple node testing

Junxiao Bi junxiao.bi at oracle.com
Tue Oct 11 23:47:46 PDT 2016


On 10/12/2016 10:36 AM, Eric Ren wrote:
> Hi,
> 
> When backporting those patches, I find that they are already in our
> product kernel, maybe
> via "stable kernel" policy, although our product kernel is 4.4 while the
> patches were merged
> into 4.6.
> 
> Seems it's another deadlock that happens when doing `chmod -R 777
> /mnt/ocfs2`
> among mutilple nodes at the same time.
Yes, but i just finish running ocfs2 full test on linux next-20161006
and didn't find any issue.

Thanks,
Junxiao.

> 
> Thanks,
> Eric
> On 10/12/2016 09:23 AM, Eric Ren wrote:
>> Hi Junxiao,
>>
>>> Hi Eric,
>>>
>>> On 10/11/2016 10:42 AM, Eric Ren wrote:
>>>> Hi Junxiao,
>>>>
>>>> As the subject, the testing hung there on a kernel without your
>>>> patches:
>>>>
>>>> "ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang"
>>>> and
>>>> "ocfs2: fix posix_acl_create deadlock"
>>>>
>>>> The stack trace is:
>>>> ```
>>>> ocfs2cts1:~ # pstree -pl 24133
>>>> discontig_runne(24133)───activate_discon(21156)───mpirun(15146)─┬─fillup_contig_b(15149)───sudo(15231)───chmod(15232)
>>>>
>>>>
>>>> ocfs2cts1:~ # pgrep -a chmod
>>>> 15232 /bin/chmod -R 777 /mnt/ocfs2
>>>>
>>>> ocfs2cts1:~ # cat /proc/15232/stack
>>>> [<ffffffffa05377ef>] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2]
>>>> [<ffffffffa053856d>] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2]
>>>> [<ffffffffa0538dbb>] ocfs2_inode_lock_atime+0xcb/0x170 [ocfs2]
>>>> [<ffffffffa0531e61>] ocfs2_readdir+0x41/0x1b0 [ocfs2]
>>>> [<ffffffff8120d03c>] iterate_dir+0x9c/0x110
>>>> [<ffffffff8120d453>] SyS_getdents+0x83/0xf0
>>>> [<ffffffff815e126e>] entry_SYSCALL_64_fastpath+0x12/0x6d
>>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>> ```
>>>>
>>>> Do you think this issue can be fixed by your patches?
>>> Looks not. Those two patches are to fix recursive locking deadlock. But
>>> from above call trace, there is no recursive lock.
>> Sorry, the call trace on another node was missing.  Here it is:
>>
>> ocfs2cts2:~ # pstree -lp
>> sshd(4292)─┬─sshd(4745)───sshd(4753)───bash(4754)───orted(4781)───fillup_contig_b(4782)───sudo(4864)───chmod(4865)
>>
>>
>> ocfs2cts2:~ # cat /proc/4865/stack
>> [<ffffffffa053e7ef>] __ocfs2_cluster_lock.isra.39+0x1bf/0x620 [ocfs2]
>> [<ffffffffa053f56d>] ocfs2_inode_lock_full_nested+0x12d/0x840 [ocfs2]
>> [<ffffffffa059c860>] ocfs2_iop_get_acl+0x40/0xf0 [ocfs2]
>> [<ffffffff812044e6>] generic_permission+0x166/0x1c0
>> [<ffffffffa0542aca>] ocfs2_permission+0xaa/0xd0 [ocfs2]
>> [<ffffffff81204596>] __inode_permission+0x56/0xb0
>> [<ffffffff812068fa>] link_path_walk+0x29a/0x560
>> [<ffffffff81206cbf>] path_lookupat+0x7f/0x110
>> [<ffffffff8120929c>] filename_lookup+0x9c/0x150
>> [<ffffffff811f96c3>] SyS_fchmodat+0x33/0x90
>> [<ffffffff815e126e>] entry_SYSCALL_64_fastpath+0x12/0x6d
>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> Thanks,
>> Eric
>>
>>
>>> Thanks,
>>> Junxiao.
>>>> I will try your patches later, but I am little worried the possibility
>>>> of reproduction may not be 100%.
>>>> So ask you to confirm;-)
>>>>
>>>> Eric
>>
>> _______________________________________________
>> Ocfs2-devel mailing list
>> Ocfs2-devel at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-devel
> 
> 




More information about the Ocfs2-devel mailing list