[Ocfs2-users] OCFS2 filesystem hangs with "dirty" locks on internal files

Charlie Smurthwaite charlie at atech.media
Mon Nov 11 07:27:38 PST 2019


Hi Gang,

Sorry for the delay in responding. The Kernel is 5.2.8 
(5.2.8-050208-generic) and the Distro is Ubuntu 18.04.3 server.

Unfortunately since working around the problem in production by reducing 
LocalAlloc from 16 to 8, I am not seeing it occur again and I did not 
keep a historical stack trace. I suspect this could be reproduced by 
running with a large LocalAlloc on a fragmented filesystem.

I now believe that this is a well known problem caused by nodes being 
unable to find contiguous blocks for local preallocation, however it is 
not clear to me what the correct solution is. I fear that in time I will 
run out of 8MB contiguous blocks, in the same was as I did with 16.

Charlie


On 14/10/2019 06:34, Gang He wrote:
> Hi Charlie,
>
> Which Linux kernel version and distribution are you using?
> Do you have the hang process stacks?
> Could you reproduce this hang stably? If yes, please provide the detailed steps.
>
> There is dlm lock hang detect tool, you can use it when the file system is in stuck.
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ganghe_o2locktop&d=DwIDbg&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=wXmkJNAUtutY0U9inuQWCbzSSRji5zLpyR0a_Mek4jM&m=R1KikGsj0fYS_4OTgHaDP8xmyyNK3rssfkHhGe107Qw&s=rLmfg7x_HV7kLRz-RikOj794hRiTrQTrPAZ4LhtyrtA&e= 
>
> Thanks
> Gang
>
>> -----Original Message-----
>> From: ocfs2-users-bounces at oss.oracle.com
>> [mailto:ocfs2-users-bounces at oss.oracle.com] On Behalf Of Charlie
>> Smurthwaite
>> Sent: 2019年9月26日 2:01
>> To: ocfs2-users at oss.oracle.com
>> Subject: [Ocfs2-users] OCFS2 filesystem hangs with "dirty" locks on internal
>> files
>>
>> Hi,
>>
>> I have been trying for some time to get to the bottom of a problem that is
>> causing an OCFS2 filesystem to hang (increasing numbers of file operations
>> hang until the filesystem becomes unusable) seemingly at random,
>> approximately once per day.
>>
>> I have got as far as dumping the busy locks and dlm lock state on all nodes
>> when this occurs.
>>
>> In summary, it appears that all nodes are waiting on locks for shared internal
>> data files, specifically:
>>
>> debugfs: encode //global_bitmap
>> M000000000000000000000baa25b2b2
>> debugfs: encode //aquota.user
>> M000000000000000000000caa25b2b2
>> debugfs: encode //aquota.group
>> M000000000000000000000daa25b2b2
>>
>> The DLM status of these 3 files are pasted below. It seems that all nodes are
>> waiting for access to the global bitmap (the bottom entry in the DLM output
>> below) but nobody is able to obtain this lock. Is there an obvious cause of this
>> situation?
>>
>> I'd be happy to provide any further information that may help. Sorry if I'm not
>> understanding the situation very well yet.
>>
>> Thanks!
>> Charlie
>>
>>
>>
>> Lockres: M000000000000000000000caa25b2b2 Owner: 3 State: 0x8 Dirty
>> Last Used: 0 ASTs Reserved: 0 Inflight: 0 Migration Pending: No
>> Refs: 12 Locks: 9 On Lists: Dirty
>> Reference Map: 0 1 2 4 5 6 7 8
>> Lock-Queue Node Level Conv Cookie Refs AST BAST Pending-Action Granted
>> 0 NL -1 0:5 2 No No None Converting 1 NL EX 1:10 2 No No None Converting
>> 5 NL EX 5:8 2 No No None Converting 6 NL EX 6:4 2 No No None Converting
>> 2 NL EX 2:7 2 No No None Converting 7 NL EX 7:9 2 No No None Converting
>> 8 NL EX 8:11 2 No No None Converting 4 NL EX 4:6 2 No No None
>> Converting 3 NL EX 3:27 2 No No None
>> --
>> Lockres: M000000000000000000000daa25b2b2 Owner: 3 State: 0x8 Dirty
>> Last Used: 0 ASTs Reserved: 0 Inflight: 0 Migration Pending: No
>> Refs: 12 Locks: 9 On Lists: Dirty
>> Reference Map: 0 1 2 4 5 6 7 8
>> Lock-Queue Node Level Conv Cookie Refs AST BAST Pending-Action Granted
>> 0 NL -1 0:8 2 No No None Converting 1 NL EX 1:13 2 No No None Converting
>> 5 NL EX 5:11 2 No No None Converting 6 NL EX 6:7 2 No No None
>> Converting 2 NL EX 2:10 2 No No None Converting 7 NL EX 7:12 2 No No
>> None Converting 8 NL EX 8:14 2 No No None Converting 4 NL EX 4:9 2 No
>> No None Converting 3 NL EX 3:30 2 No No None
>> --
>>
>> Lockres: M000000000000000000000baa25b2b2 Owner: 3 State: 0x8 Dirty
>> Last Used: 0 ASTs Reserved: 0 Inflight: 0 Migration Pending: No
>> Refs: 12 Locks: 9 On Lists: Dirty
>> Reference Map: 0 1 2 4 5 6 7 8
>> Lock-Queue Node Level Conv Cookie Refs AST BAST Pending-Action
>> Converting 4 NL EX 4:39 2 No No None Converting 8 NL PR 8:39 2 No No
>> None Converting 0 NL PR 0:30 2 No No None Converting 6 NL PR 6:39 2 No
>> No None Converting 1 NL PR 1:39 2 No No None Converting 3 NL EX 3:33 2
>> No No None Converting 7 NL EX 7:39 2 No No None Converting 2 NL EX 2:39
>> 2 No No None Converting 5 NL PR 5:39 2 No No None
>>
>>
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> https://oss.oracle.com/mailman/listinfo/ocfs2-users



More information about the Ocfs2-users mailing list