[Ocfs2-users] Tracking down hangs

Sunil Mushran sunil.mushran at oracle.com
Thu Jun 3 14:19:50 PDT 2010


It is not a dlm issue.

On 06/03/2010 07:38 AM, Andrew Robert Nicols wrote:
> On Thu, Jun 03, 2010 at 11:12:49AM +0100, Andrew Robert Nicols wrote:
>    
>> What's the best place to start looking for the cause of these hangs? I've
>> attached the dmesg output which includes some call traces for hung threads.
>> I have stat_sysdir output though I suspect it's not so relevant. A
>> scanlocks output doesn't reveal any busy locks that I can see (unless I'm
>> not hitting it at the right time or misreading the output).
>>      
> I've been attempting to trace the issue and have enabled tracing of DLM and
> DLM_THREAD and have hit some interesting results.
>
> On our readonly servers, we ordinarily see the following in the kernel log:
>
> Jun  3 14:59:18 thumper4 kernel: [188257.478242] (6083,3):dlm_mle_release:413 calling mle_release for M00000000000000294ce11700000000, type 0
> Jun  3 14:59:18 thumper4 kernel: [188257.478785] (6083,3):dlm_mle_release:413 calling mle_release for O00000000000000294ce11700000000, type 0
> Jun  3 14:59:18 thumper4 kernel: [188257.479333] (6083,3):dlm_mle_release:413 calling mle_release for N000000001c8f8525, type 0
> Jun  3 14:59:18 thumper4 kernel: [188258.115528] (6083,3):dlm_mle_release:413 calling mle_release for M000000000000002983bf3300000000, type 0
> Jun  3 14:59:18 thumper4 kernel: [188258.116071] (6083,3):dlm_mle_release:413 calling mle_release for O000000000000002983bf3300000000, type 0
> Jun  3 14:59:18 thumper4 kernel: [188258.118559] (6083,3):dlm_mle_release:413 calling mle_release for N000000001c8f8525, type 0
> Jun  3 14:59:20 thumper4 kernel: [188259.606686] (6083,3):dlm_mle_release:413 calling mle_release for M0000000000000029b71e2400000000, type 0
> Jun  3 14:59:20 thumper4 kernel: [188259.607000] (6083,3):dlm_mle_release:413 calling mle_release for O0000000000000029b71e2400000000, type 0
> Jun  3 14:59:20 thumper4 kernel: [188259.607548] (6083,3):dlm_mle_release:413 calling mle_release for N000000001c8f8525, type 0
> Jun  3 14:59:21 thumper4 kernel: [188261.055244] (6083,3):dlm_mle_release:413 calling mle_release for O000000000000001b5d229500000000, type 0
> Jun  3 14:59:21 thumper4 kernel: [188261.055838] (6083,3):dlm_mle_release:413 calling mle_release for M000000000000001b5d229500000000, type 0
> Jun  3 14:59:21 thumper4 kernel: [188261.055943] (6083,3):dlm_mle_release:413 calling mle_release for N000000001b7ddf5c, type 0
>
> At times when we see a hang, we see the following:
>
> Jun  3 14:59:22 thumper4 kernel: [188261.314168] (6083,3):dlm_proxy_ast_handler:306 lvb: none
> Jun  3 14:59:22 thumper4 kernel: [188261.314173] (6083,3):dlm_proxy_ast_handler:308 type=1, blocked_type=5
> Jun  3 14:59:22 thumper4 kernel: [188261.314177] (6083,3):dlm_proxy_ast_handler:336 lockres M0000000000000024cb281700000000
> Jun  3 14:59:22 thumper4 kernel: [188261.314168] (6904,1):dlmconvert_remote:272 type=3, convert_type=-1, busy=0
> Jun  3 14:59:22 thumper4 kernel: [188261.314590] (6083,3):dlm_proxy_ast_handler:306 lvb: none
> Jun  3 14:59:22 thumper4 kernel: [188261.314594] (6083,3):dlm_proxy_ast_handler:308 type=0, blocked_type=0
> Jun  3 14:59:22 thumper4 kernel: [188261.314597] (6083,3):dlm_proxy_ast_handler:336 lockres M0000000000000024cb281700000000
> Jun  3 14:59:22 thumper4 kernel: [188261.314600] (6083,3):dlm_proxy_ast_handler:387 ast: adding to granted list... type=3, convert_type=0
> Jun  3 14:59:22 thumper4 kernel: [188261.322638] (27918,0):dlmconvert_remote:272 type=0, convert_type=-1, busy=0
> Jun  3 14:59:23 thumper4 kernel: [188263.044145] (6083,3):dlm_proxy_ast_handler:306 lvb: get lvb
> Jun  3 14:59:23 thumper4 kernel: [188263.044145] (6083,3):dlm_proxy_ast_handler:308 type=0, blocked_type=0
> Jun  3 14:59:23 thumper4 kernel: [188263.044145] (6083,3):dlm_proxy_ast_handler:336 lockres M0000000000000024cb281700000000
> Jun  3 14:59:23 thumper4 kernel: [188263.044145] (6083,3):dlm_proxy_ast_handler:387 ast: adding to granted list... type=0, convert_type=3
> Jun  3 14:59:23 thumper4 kernel: [188263.044145] (6083,3):dlm_mle_release:413 calling mle_release for M00000000000000299c5c9e00000000, type 0
> Jun  3 14:59:23 thumper4 kernel: [188263.044424] (6083,3):dlm_mle_release:413 calling mle_release for O00000000000000299c5c9e00000000, type 0
> Jun  3 14:59:23 thumper4 kernel: [188263.046232] (6083,3):dlm_mle_release:413 calling mle_release for N000000000ae7c9ad, type 0
> Jun  3 14:59:23 thumper4 kernel: [188263.046884] (6083,3):dlm_mle_release:413 calling mle_release for M0000000000000028df78fd00000000, type 0
> Jun  3 14:59:23 thumper4 kernel: [188263.047426] (6083,3):dlm_mle_release:413 calling mle_release for O0000000000000028df78fd00000000, type 0
> Jun  3 14:59:23 thumper4 kernel: [188263.048029] (6083,3):dlm_mle_release:413 calling mle_release for N000000001bb0685f, type 0
>
> For this particular hang, catting our file took:
> Thu Jun 03 14:59:23 BST 2010 real       0m1.648s
>
> This corresponds pretty well to the entries at:
> Jun  3 14:59:22 thumper4 kernel: [188261.314600] (6083,3):dlm_proxy_ast_handler:387 ast: adding to granted list... type=3, convert_type=0
> Jun  3 14:59:22 thumper4 kernel: [188261.322638] (27918,0):dlmconvert_remote:272 type=0, convert_type=-1, busy=0
> Jun  3 14:59:23 thumper4 kernel: [188263.044145] (6083,3):dlm_proxy_ast_handler:306 lvb: get lvb
> Jun  3 14:59:23 thumper4 kernel: [188263.044145] (6083,3):dlm_proxy_ast_handler:308 type=0, blocked_type=0
>
> We see this every time there's a hang and the times correspond in the same kind
> of way each time.
>
> Is it worth enabling any other trace bits to see if there's something else
> causing this?
>
> Any help with this problem would be much appreciated,
>
> Andrew
>
>    
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20100603/fc7246cc/attachment.html 


More information about the Ocfs2-users mailing list