[Ocfs2-users] Processes hanging when accessing OCFS2

Nick nick at agentpoint.com
Sun Mar 10 16:41:24 PDT 2013


Hi

I'm relatively new to OCFS2 and I am using in in conjunction with DRBD.
I'm hitting issues when the mounted file system is used heavily.
The process doing the reading/writing sometimes hangs and gets killed by 
the kernel.

E.g.
[925741.227267] INFO: task rsync:29141 blocked for more than 120 seconds.
[925741.238300] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[925741.260379] rsync           D ffff88042fcd39c0     0 29141 29139 
0x00000000
[925741.260382]  ffff8802ffba98d0 0000000000000082 ffff8804157c9700 
ffff8802ffba9fd8
[925741.260386]  ffff8802ffba9fd8 ffff8802ffba9fd8 ffff880419545c00 
ffff8804157c9700
[925741.260389]  ffffffffa0532295 ffff8804157c9700 ffff880408e6ff38 
ffff880408e6ff40
[925741.260393] Call Trace:
[925741.260409]  [<ffffffffa0532295>] ? ocfs2_read_blocks+0x3e5/0x6a0 
[ocfs2]
[925741.260421]  [<ffffffff81680b89>] schedule+0x29/0x70
[925741.260425]  [<ffffffff8168199d>] rwsem_down_failed_common+0xcd/0x170
[925741.260429]  [<ffffffff81681a75>] rwsem_down_read_failed+0x15/0x17
[925741.260432]  [<ffffffff813365f4>] call_rwsem_down_read_failed+0x14/0x30
[925741.260436]  [<ffffffff8167fd64>] ? down_read+0x24/0x2b
[925741.260453]  [<ffffffffa0557e71>] ocfs2_start_trans+0xe1/0x1d0 [ocfs2]
[925741.260475]  [<ffffffffa052ee07>] 
ocfs2_write_begin_nolock+0x3e7/0x1d50 [ocfs2]
[925741.260492]  [<ffffffffa0532295>] ? ocfs2_read_blocks+0x3e5/0x6a0 
[ocfs2]
[925741.260509]  [<ffffffffa0550f90>] ? 
ocfs2_inode_cache_io_lock+0x20/0x20 [ocfs2]
[925741.260527]  [<ffffffffa0558250>] ? ocfs2_extend_trans+0x200/0x200 
[ocfs2]
[925741.260542]  [<ffffffffa053086e>] ocfs2_write_begin+0xfe/0x220 [ocfs2]
[925741.260547]  [<ffffffff81122706>] 
generic_file_buffered_write+0x116/0x280
[925741.260564]  [<ffffffffa0550c3a>] ocfs2_file_aio_write+0x82a/0x880 
[ocfs2]
[925741.260568]  [<ffffffff81181746>] do_sync_write+0xe6/0x120
[925741.260572]  [<ffffffff812b2bac>] ? security_file_permission+0x2c/0xb0
[925741.260575]  [<ffffffff81181d21>] ? rw_verify_area+0x61/0xf0
[925741.260578]  [<ffffffff8118203c>] vfs_write+0xac/0x180
[925741.260580]  [<ffffffff8118236a>] sys_write+0x4a/0x90
[925741.260585]  [<ffffffff81689d29>] system_call_fastpath+0x16/0x1b

and

[1029000.298598] INFO: task df:6338 blocked for more than 120 seconds.
[1029000.309552] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[1029000.331360] df              D ffff88042fd139c0     0  6338 1516 
0x00000000
[1029000.331365]  ffff880087d69b08 0000000000000082 ffff8803c0cb2e00 
ffff880087d69fd8
[1029000.331370]  ffff880087d69fd8 ffff880087d69fd8 ffff8804195d8000 
ffff8803c0cb2e00
[1029000.331373]  ffff88042fffda08 7fffffffffffffff ffff880087d69d20 
ffff880087d69d28
[1029000.331383] Call Trace:
[1029000.331393]  [<ffffffff81680b89>] schedule+0x29/0x70
[1029000.331397]  [<ffffffff8167f064>] schedule_timeout+0x2a4/0x320
[1029000.331403]  [<ffffffffa049d020>] ? 
o2dlm_lock_ast_wrapper+0x20/0x20 [ocfs2_stack_o2cb]
[1029000.331407]  [<ffffffff8167f42d>] ? mutex_lock+0x1d/0x50
[1029000.331417]  [<ffffffff8107e5e1>] ? in_group_p+0x31/0x40
[1029000.331425]  [<ffffffff8118d176>] ? generic_permission+0x176/0x260
[1029000.331429]  [<ffffffff816809af>] wait_for_common+0xdf/0x190
[1029000.331433]  [<ffffffff81087c40>] ? try_to_wake_up+0x2a0/0x2a0
[1029000.331437]  [<ffffffff81680b5d>] wait_for_completion+0x1d/0x20
[1029000.331461]  [<ffffffffa050bd49>] 
__ocfs2_cluster_lock.isra.31+0x219/0x840 [ocfs2]
[1029000.331470]  [<ffffffff8118e813>] ? lookup_fast+0xd3/0x310
[1029000.331473]  [<ffffffff8107e86a>] ? lg_local_unlock+0x1a/0x20
[1029000.331476]  [<ffffffff8118dd65>] ? complete_walk+0xa5/0x130
[1029000.331499]  [<ffffffffa050d671>] 
ocfs2_inode_lock_full_nested+0x201/0x4e0 [ocfs2]
[1029000.331527]  [<ffffffffa055e5f5>] ocfs2_statfs+0x65/0x320 [ocfs2]
[1029000.331532]  [<ffffffff811b1261>] statfs_by_dentry+0xa1/0x140
[1029000.331535]  [<ffffffff811b131b>] vfs_statfs+0x1b/0xb0
[1029000.331538]  [<ffffffff811b14a7>] user_statfs+0x37/0x50
[1029000.331542]  [<ffffffff811b1540>] sys_statfs+0x20/0x40
[1029000.331547]  [<ffffffff81689d29>] system_call_fastpath+0x16/0x1b

As you can see it isn't one specific thing. A largish rsync copy from a 
remote server and a simple df both triggered it.
The stack traces look pretty different as well which doesn't help.

The setup is two identical servers, mirroring disks using DRBD with a 
fairly vanilla OCFS setup on top.
I followed the DRBD user guide for configuration. All packages came from 
Ubuntu's repositories.

I've done some googling and haven't found anything too helpful 
unfortunately.
I can't tell what could be causing it. Any help would be greatly 
appreciated.

Thanks
-- 
Nick Stallman
Agentpoint Pty Ltd
The Real Estate Web Developers
Sydney, Australia
nick at agentpoint.com
www.agentpoint.com.au | www.zooproperty.com | www.ginga.com.au | 
www.business2.com.au

Business2.com.au is a real estate agent information website that helps 
you understand Portals, Technology and comes with FREE tools to help 
your Agency become an online success!



More information about the Ocfs2-users mailing list