[Ocfs2-users] OCFS2 tuning, fragmentation and localalloc option. Cluster hanging during mix read+write workloads

Tue Aug 6 14:16:36 PDT 2013

Hi Gavin,

On 08/06/2013 01:59 PM, Gavin Jones wrote:
> Hi Goldwyn,
>
> Apologies for the delayed reply.
>
> The hung Apache process / OCFS issue cropped up again, so I thought
> I'd pass along the contents of /proc/<pid>/stack of a few affected
> processes:
>
> gjones at slipapp02:~> sudo cat /proc/27521/stack
> gjones's password:
> [<ffffffff811663b4>] poll_schedule_timeout+0x44/0x60
> [<ffffffff81166d56>] do_select+0x5a6/0x670
> [<ffffffff81166fbe>] core_sys_select+0x19e/0x2d0
> [<ffffffff811671a5>] sys_select+0xb5/0x110
> [<ffffffff815429bd>] system_call_fastpath+0x1a/0x1f
> [<00007f394bdd5f23>] 0x7f394bdd5f23
> [<ffffffffffffffff>] 0xffffffffffffffff
> gjones at slipapp02:~> sudo cat /proc/27530/stack
> [<ffffffff81249721>] sys_semtimedop+0x5a1/0x8b0
> [<ffffffff815429bd>] system_call_fastpath+0x1a/0x1f
> [<00007f394bdddb77>] 0x7f394bdddb77
> [<ffffffffffffffff>] 0xffffffffffffffff
> gjones at slipapp02:~> sudo cat /proc/27462/stack
> [<ffffffff81249721>] sys_semtimedop+0x5a1/0x8b0
> [<ffffffff815429bd>] system_call_fastpath+0x1a/0x1f
> [<00007f394bdddb77>] 0x7f394bdddb77
> [<ffffffffffffffff>] 0xffffffffffffffff
> gjones at slipapp02:~> sudo cat /proc/27526/stack
> [<ffffffff81249721>] sys_semtimedop+0x5a1/0x8b0
> [<ffffffff815429bd>] system_call_fastpath+0x1a/0x1f
> [<00007f394bdddb77>] 0x7f394bdddb77
> [<ffffffffffffffff>] 0xffffffffffffffff
>
>
> Additionally, in dmesg I see, for example,
>
> [774981.361149] (/usr/sbin/httpd,8266,3):ocfs2_unlink:951 ERROR: status = -2
> [775896.135467]
> (/usr/sbin/httpd,8435,3):ocfs2_check_dir_for_entry:2119 ERROR: status
> = -17
> [775896.135474] (/usr/sbin/httpd,8435,3):ocfs2_mknod:459 ERROR: status = -17
> [775896.135477] (/usr/sbin/httpd,8435,3):ocfs2_create:629 ERROR: status = -17
> [788406.624126] connection1:0: ping timeout of 5 secs expired, recv
> timeout 5, last rx 4491991450, last ping 4491992701, now 4491993952
> [788406.624138] connection1:0: detected conn error (1011)
> [788406.640132] connection2:0: ping timeout of 5 secs expired, recv
> timeout 5, last rx 4491991451, last ping 4491992702, now 4491993956
> [788406.640142] connection2:0: detected conn error (1011)
> [788406.928134] connection4:0: ping timeout of 5 secs expired, recv
> timeout 5, last rx 4491991524, last ping 4491992775, now 4491994028
> [788406.928150] connection4:0: detected conn error (1011)
> [788406.944147] connection5:0: ping timeout of 5 secs expired, recv
> timeout 5, last rx 4491991528, last ping 4491992779, now 4491994032
> [788406.944165] connection5:0: detected conn error (1011)
> [788408.640123] connection3:0: ping timeout of 5 secs expired, recv
> timeout 5, last rx 4491991954, last ping 4491993205, now 4491994456
> [788408.640134] connection3:0: detected conn error (1011)
> [788409.907968] connection1:0: detected conn error (1020)
> [788409.908280] connection2:0: detected conn error (1020)
> [788409.912683] connection4:0: detected conn error (1020)
> [788409.913152] connection5:0: detected conn error (1020)
> [788411.491818] connection3:0: detected conn error (1020)
>
>
> that repeats for a bit and then I see
>
> [1952161.012214] INFO: task /usr/sbin/httpd:27491 blocked for more
> than 480 seconds.
> [1952161.012219] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [1952161.012221] /usr/sbin/httpd D ffff88081fc52b40 0 27491 27449 0x00000000
> [1952161.012226] ffff88031a85dc50 0000000000000082 ffff880532a92640
> ffff88031a85dfd8
> [1952161.012231] ffff88031a85dfd8 ffff88031a85dfd8 ffff8807f791c300
> ffff880532a92640
> [1952161.012235] ffffffff8115f3ae ffff8802804bdd98 ffff880532a92640
> 00000000ffffffff
> [1952161.012239] Call Trace:
> [1952161.012251] [<ffffffff81538fea>] __mutex_lock_slowpath+0xca/0x140
> [1952161.012257] [<ffffffff81538b0a>] mutex_lock+0x1a/0x40
> [1952161.012262] [<ffffffff81160e80>] do_lookup+0x290/0x340
> [1952161.012269] [<ffffffff81161c7f>] path_lookupat+0x10f/0x700
> [1952161.012274] [<ffffffff8116229c>] do_path_lookup+0x2c/0xc0
> [1952161.012279] [<ffffffff8116372d>] user_path_at_empty+0x5d/0xb0
> [1952161.012283] [<ffffffff81158d9d>] vfs_fstatat+0x2d/0x70
> [1952161.012288] [<ffffffff81158fe2>] sys_newstat+0x12/0x30
> [1952161.012293] [<ffffffff815429bd>] system_call_fastpath+0x1a/0x1f
> [1952161.012308] [<00007f394bdcfb05>] 0x7f394bdcfb04
> [1952161.012382] INFO: task /usr/sbin/httpd:27560 blocked for more
> than 480 seconds.
> [1952161.012384] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [1952161.012385] /usr/sbin/httpd D ffff88081fd52b40 0 27560 27449 0x00000000
> [1952161.012389] ffff880224023c50 0000000000000086 ffff88024326e580
> ffff880224023fd8
> [1952161.012393] ffff880224023fd8 ffff880224023fd8 ffff8807f79cc800
> ffff88024326e580
> [1952161.012397] ffffffff8115f3ae ffff8802804bdd98 ffff88024326e580
> 00000000ffffffff
> [1952161.012401] Call Trace:
> [1952161.012406] [<ffffffff81538fea>] __mutex_lock_slowpath+0xca/0x140
> [1952161.012410] [<ffffffff81538b0a>] mutex_lock+0x1a/0x40
> [1952161.012415] [<ffffffff81160e80>] do_lookup+0x290/0x340
> [1952161.012420] [<ffffffff81161c7f>] path_lookupat+0x10f/0x700
> [1952161.012425] [<ffffffff8116229c>] do_path_lookup+0x2c/0xc0
> [1952161.012430] [<ffffffff8116372d>] user_path_at_empty+0x5d/0xb0
> [1952161.012434] [<ffffffff81158d9d>] vfs_fstatat+0x2d/0x70
> [1952161.012438] [<ffffffff81158fe2>] sys_newstat+0x12/0x30
> [1952161.012442] [<ffffffff815429bd>] system_call_fastpath+0x1a/0x1f
> [1952161.012448] [<00007f394bdcfb05>] 0x7f394bdcfb04[1953601.012280]
> INFO: task /usr/sbin/httpd:27506 blocked for more than 480 seconds.
> [1953601.012282] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [1953601.012284] /usr/sbin/httpd D ffff88081fd52b40 0 27506 27449 0x00000000
> [1953601.012287] ffff8804044cf988 0000000000000082 ffff8802e1bfc340
> ffff8804044cffd8
> [1953601.012291] ffff8804044cffd8 ffff8804044cffd8 ffff8807f79cc800
> ffff8802e1bfc340
> [1953601.012296] 0000000000012b40 ffff8804044cfb58 7fffffffffffffff
> ffff8802e1bfc340
> [1953601.012300] Call Trace:
> [1953601.012305] [<ffffffff815387a2>] schedule_timeout+0x272/0x2f0
> [1953601.012311] [<ffffffff815396e2>] wait_for_common+0xd2/0x180
> [1953601.012345] [<ffffffffa03780a2>]
> __ocfs2_cluster_lock.isra.34+0x1f2/0x6d0 [ocfs2]
> [1953601.012395] [<ffffffffa0379578>]
> ocfs2_inode_lock_full_nested+0x168/0x9d0 [ocfs2]
> [1953601.012436] [<ffffffffa037edd2>] ocfs2_permission+0x32/0xf0 [ocfs2]
> [1953601.012466] [<ffffffff8115f503>] inode_permission+0x73/0x110
> [1953601.012472] [<ffffffff811612cd>] link_path_walk+0x22d/0x850
> [1953601.012477] [<ffffffff81161bc1>] path_lookupat+0x51/0x700
> [1953601.012482] [<ffffffff8116229c>] do_path_lookup+0x2c/0xc0
> [1953601.012486] [<ffffffff8116372d>] user_path_at_empty+0x5d/0xb0
> [1953601.012490] [<ffffffff81158d9d>] vfs_fstatat+0x2d/0x70
> [1953601.012494] [<ffffffff81158fe2>] sys_newstat+0x12/0x30
> [1953601.012498] [<ffffffff815429bd>] system_call_fastpath+0x1a/0x1f
> [1953601.012504] [<00007f394bdcfb05>] 0x7f394bdcfb04
>

This is the most interesting backtrace. The process is definitely 
waiting for a dlm lock while holding a mutex from fstat() call. All the 
others are waiting for this process to finish.

>
> Your comment about the network got me thinking; I don't see on the
> hosts via any other means or monitoring that the network interface is
> going down.  However, I was logged into the (EqualLogic) SAN (where
> OCFS2 volumes reside) console yesterday and noticed some iSCSI session
> messages like:
>
> iSCSI session to target ........ was closed. Load balancing request
> was received on the array.
>
> Is it possible / probable that the iSCSI load balancing on the SAN is
> causing OCFS2 problems?  I'm wondering if exceeding the iSCSI ping
> timeout leads to a connection error, which then causes OCFS2 to be
> unsure of cluster state or have some DLM issue, when then leads to
> hung processes.  I know it seems like I'm shooting in the dark, but, I
> don't have much else to go on.

Do you have iSCSI and ocfs2 communication on the same network interface? 
If yes, the "connection4:0 " messages are from your iSCSI subsystem and 
your network does seem to have a problem there (don't know what though).

>
> Still working on permission for the DLM debugging, hope to have it
> turned on and get some messages for you either today or tomorrow.
>
> Thanks,
>
> Gavin W. Jones
> Where 2 Get It, Inc.
>
> On Thu, Jul 18, 2013 at 5:54 PM, Goldwyn Rodrigues <rgoldwyn at suse.de> wrote:
>> On 07/18/2013 11:42 AM, Gavin Jones wrote:
>>>
>>> Hello,
>>>
>>> Sure, I'd be happy to provide such information next time this occurs.
>>>
>>> Can you elaborate, or point me at documentation / procedure regarding
>>> the DLM debug logs and what would be helpful to see?  I have read
>>> "Troubleshooting OCFS2" [1] and the section "Debugging File System
>>> Locks" --is this what you're referring to?
>>
>>
>> No. I was looking for a more proactive approach we use in debugging.
>> # debugfs.ocfs2 -l
>> will provide you a list of debug messages you can turn on/off.
>>
>> In order to turn on DLM_GLUE (the layer between ocfs2 and DLM) specific
>> operations, issue
>>
>> # debugfs.ocfs2 -l DLM_GLUE allow
>>
>> Please note, this generates a lot of messages.
>>
>>
>>>
>>> Not sure if this will provide additional context or just muddy the
>>> waters, but I thought to provide some syslog messages from an affected
>>> server the last time this occurred.
>>>
>>> Jul 14 15:36:55 slipapp07 kernel: [2173588.704093] o2net: Connection
>>> to node slipapp03 (num 2) at 172.16.40.122:7777 has been idle for
>>> 30.97 secs, shutting it down.
>>> Jul 14 15:36:55 slipapp07 kernel: [2173588.704146] o2net: No longer
>>> connected to node slipapp03 (num 2) at 172.16.40.122:7777
>>> Jul 14 15:36:55 slipapp07 kernel: [2173588.704279]
>>> (kworker/u:1,12787,4):dlm_do_assert_master:1665 ERROR: Error -112 when
>>> sending message 502 (key 0xdc8be796) to node 2
>>> Jul 14 15:36:55 slipapp07 kernel: [2173588.704295]
>>> (kworker/u:5,26056,5):dlm_do_master_request:1332 ERROR: link to 2 went
>>> down!
>>> Jul 14 15:36:55 slipapp07 kernel: [2173588.704301]
>>> (kworker/u:5,26056,5):dlm_get_lock_resource:917 ERROR: status = -112
>>> Jul 14 15:37:25 slipapp07 kernel: [2173618.784153] o2net: No
>>> connection established with node 2 after 30.0 seconds, giving up.
>>> <snip>
>>> Jul 14 15:39:14 slipapp07 kernel: [2173727.920793]
>>> (kworker/u:2,13894,1):dlm_do_assert_master:1665 ERROR: Error -112 when
>>> sending message 502 (key 0xdc8be796) to node 4
>>> Jul 14 15:39:14 slipapp07 kernel: [2173727.920833]
>>> (/usr/sbin/httpd,5023,5):dlm_send_remote_lock_request:336 ERROR:
>>> A08674A831ED4048B5136BD8613B21E0: res N000000000152a8da, Error -112
>>> send CREATE LOCK to node 4
>>> Jul 14 15:39:14 slipapp07 kernel: [2173727.930562]
>>> (kworker/u:2,13894,1):dlm_do_assert_master:1665 ERROR: Error -107 when
>>> sending message 502 (key 0xdc8be796) to node 4
>>> Jul 14 15:39:14 slipapp07 kernel: [2173727.944998]
>>> (kworker/u:2,13894,1):dlm_do_assert_master:1665 ERROR: Error -107 when
>>> sending message 502 (key 0xdc8be796) to node 4
>>> Jul 14 15:39:14 slipapp07 kernel: [2173727.951511]
>>> (kworker/u:2,13894,1):dlm_do_assert_master:1665 ERROR: Error -107 when
>>> sending message 502 (key 0xdc8be796) to node 4
>>> Jul 14 15:39:14 slipapp07 kernel: [2173727.973848]
>>> (kworker/u:2,13894,1):dlm_do_assert_master:1665 ERROR: Error -107 when
>>> sending message 502 (key 0xdc8be796) to node 4
>>> Jul 14 15:39:14 slipapp07 kernel: [2173727.990216]
>>> (kworker/u:2,13894,7):dlm_do_assert_master:1665 ERROR: Error -107 when
>>> sending message 502 (key 0xdc8be796) to node 4
>>> Jul 14 15:39:14 slipapp07 kernel: [2173728.024139]
>>> (/usr/sbin/httpd,5023,5):dlm_send_remote_lock_request:336 ERROR:
>>> A08674A831ED4048B5136BD8613B21E0: res N000000000152a8da, Error -107
>>> send CREATE LOCK to node 4
>>> <snip, many, many more like the above>
>>>
>>> Which I suppose would indicate DLM issues; I have previously tried to
>>> investigate this (via abovementioned guide) but was unable to make
>>> real headway.
>>
>>
>> No, this means your network is the problem, which in turn is affecting your
>> DLM operations. The problem could be anywhere from hardware to the network
>> device to possibly a bug in the networking code. You may want to check if
>> there are other indications that the network interface is down.
>>
>>
>>>
>>> I apologize for the rather basic questions...
>>>
>> No problem. I hope this helps resolve your issues.
>>
>>
>>
>>> Thanks,
>>>
>>> Gavin W. Jones
>>> Where 2 Get It, Inc.
>>>
>>> [1]:  http://docs.oracle.com/cd/E37670_01/E37355/html/ol_tshoot_ocfs2.html
>>>
>>
>> <snipped>
>>
>> --
>> Goldwyn
>
>
>

-- 
Goldwyn