[Ocfs2-devel] [PATCH] fix dead lock caused by ocfs2_defrag_extent

Thu Aug 30 01:24:56 PDT 2018

Hi Changwei,

On 08/30/2018 03:52 PM, Changwei Ge wrote:
> Hi Larry,
> 
> 
> On 2018/8/30 14:26, Larry Chen wrote:
>> Hi Changwei,
>>
>> Maybe we need more investigation.
>>
>> The following is your fix:
>> lock truncate log inode
>>      __ocfs2_flush_truncate_log()
>>      ocfs2_lock_allocators_move_extents()
>>          unlock truncate log inode
>>
>> The lock action will happen like following:
>> lock(truncate_inode)
>>      lock(sub allocat)
>>      lock(local_alloc) or lock(global_bitmap)
> 
> I don't think we have to worry much about mixed order  of cluster lock
> and inode mutex since cluster lock granted node will directly succeed
> instead of waiting for itself.
> 
>>
>> I'm not sure if there is another code path that tries to get the same
>> locks but in different order, which may causes dead locks.

Yeah, I use lock to mean both inode_lock and ocfs2_inode_lock.
As too many types of lock and inode locks are involved, I can not 
guarantee that there is no logic error.

>>
>> Indeed ocfs2 involves too many locks, I would like to reduce the
>> deadlock risk at max extent.
>>
>> Another way is to add an new argument for __ocfs2_flush_truncate which
>> indicates whether global bitmap is needed to be locked.
> 
> Sounds a feasible way :)

Haha, I also prefer this way :)
I'll send another patch and run test cases on my environment.

Thanks,
Larry