[Ocfs2-users] Doubts about OCFS2 Performance

Wed Jul 28 10:46:06 PDT 2010

Can you attach some info in the bz. Like a iostat run for a minute
or more. top output, showing active processes. If the slowdown
is being observed by any one process, maybe a strace -p pid -T -ttt -o 
/tmp/out
output.

On 07/28/2010 05:28 AM, Jeronimo Bezerra wrote:
>    Guys,
>
> comments or advices, please?
>
> Jeronimo Bezerra
>
> Em 27/07/2010 11:44, Jeronimo Bezerra escreveu:
>    
>>     Thank you Aaron for the quickly answer. Below, my comments:
>>
>> Em 27/07/2010 10:57, Aaron Thompson escreveu:
>>      
>>> This looks like a disk issue - Contention, or wait time. This could be
>>> a result of the time needed to write that 80k message to all users
>>> mailboxes is throttling your disk connection or pushing some limit for
>>> file size that moves the io into a larger set of blocks than smaller
>>> messages would use. It looks and sounds like you may be waiting for
>>> the disk to write those messages - I guess it depends on the size of
>>> *all*.
>>>        
>> Ok. I guess it too, and I intend to increase the block size from 2 KB to
>> 4 KB and split my 2 TB partition in 4-5 partitions of 400 GB to share
>> the load between the two main controllers from storage device. Do you
>> think this is a good improvement or more overhead?
>>
>> One doubt is: is this contention caused by Debian (and its IO/ocfs2
>> manager) or by Storage device? I made some IO benchmarchs using Debian
>> with OCFS and reached almost 100 MBps!! I know that the profile of
>> benchmarch is different from mail environment (with a lot of small
>> files), but...
>>
>>      
>>> Your load is a function of more than CPU - your IO Wait is in there
>>> somewhere also. I would suggest iostat, it may give you a better view
>>> of which disk is doing how much work. I believe this is packaged with
>>> a few other utilities as systat in debian (I've been on RHEL for a
>>> while so make sure you check)
>>>        
>> Today I have the 2 TB partition spread over 20 FC disks in a Raid 5
>> array. iostat didn't help so much:
>>
>> Device:            tps      MB_read/s    MB_wrtn/s    MB_read    MB_wrtn
>> dm-0           4637,25         6,99           2,07              7
>>             2
>> dm-0           1491,18         2,91           0,00              2
>>               0
>> dm-0           1535,51         2,58           0,41              2
>>             0
>>
>> Any other advice? Thanks again
>>
>> Jeronimo
>>
>>      
>>> Good Luck.
>>>
>>> @
>>>
>>> Aaron Thompson      Applications Administrator / Database Administrator
>>> http://www.uni.edu/~prefect/                University of Northern Iowa
>>>
>>> "All it takes to fly is to hurl yourself at the ground...  and miss."
>>>                                                           -Douglas Adams
>>>
>>> On 07/27/10 08:32, Jeronimo Bezerra wrote:
>>>        
>>>>      Hello all,
>>>>
>>>> I need some help to understand one situation about disk/OCFS
>>>> performance. Let-me introduce my environment:
>>>>
>>>> I use OCFS2 in a mail environment with almost 10k users, in a OCFS2
>>>> partition of 2 TB (~1TB in use). A lot of low files, block size of 2Kb.
>>>> It's a Debian Etch Linux, in a IBM Ds4500 Storage with QLA2340.
>>>>
>>>> Since a few weeks ago, I noted a poor performance when I have a mail to
>>>> all users (all-l), mainly when this e-mail has more than 80Kb (yes, I
>>>> know, It shouldn't happen, but here we have friendly fire! ). This
>>>> situation is new, because this environment has almost 3 years. When this
>>>> email 'appears' in my mail postfix queue, after some seconds, my load
>>>> average goes to 100 ->    200 ->    300! Yesterday I paused the delivery of
>>>> these emails in postfix (postsuper -h ALL) and after a one minute, the
>>>> load average went to 2,31! One very strange thing is the mpstat  output
>>>> in that moment of high load:
>>>>
>>>> 09:28:17     CPU   %user   %nice    %sys %iowait    %irq   %soft
>>>> %steal   %idle    intr/s
>>>> 09:28:18     all    7,05    0,00    2,59   11,12    0,00    0,22
>>>> 0,00   79,02   1788,12
>>>> 09:28:18       0   37,62    0,00   15,84   41,58    0,00    3,96
>>>> 0,00    0,99   1790,10
>>>> 09:28:18       1    2,97    0,00    4,95    5,94    0,00    0,00
>>>> 0,00   92,08      0,00
>>>> 09:28:18       2    0,00    0,00    1,98    6,93    0,00    0,00
>>>> 0,00  112,87      0,00
>>>> 09:28:18       3    0,00    0,00    0,99    4,95    0,00    0,00
>>>> 0,00  158,42      0,00
>>>> 09:28:18       4    0,00    0,00    0,00    0,00    0,00    0,00
>>>> 0,00  100,99      0,00
>>>> 09:28:18       5    0,99    0,00    1,98   31,68    0,00    0,00
>>>> 0,00   70,30      0,00
>>>> 09:28:18       6    0,00    0,00    0,00    0,00    0,00    0,00
>>>> 0,00  185,15      0,00
>>>> 09:28:18       7    0,00    0,00    0,00    0,00    0,00    0,00
>>>> 0,00   52,48      0,00
>>>> 09:28:18       8   29,70    0,00    7,92   57,43    0,00    0,00
>>>> 0,00    6,93      0,00
>>>> 09:28:18       9    2,97    0,00    5,94   43,56    0,00    0,00
>>>> 0,00   50,50      0,00
>>>> 09:28:18      10   47,52    0,00    3,96    1,98    0,00    0,00
>>>> 0,00   54,46      0,00
>>>> 09:28:18      11    0,00    0,00    0,00    3,96    0,00    0,00
>>>> 0,00   99,01      0,00
>>>> 09:28:18      12    0,00    0,00    0,00    0,00    0,00    0,00
>>>> 0,00   99,01      0,00
>>>> 09:28:18      13    3,96    0,00    1,98    0,00    0,00    0,00
>>>> 0,00   99,01      0,00
>>>> 09:28:18      14    0,00    0,00    0,00    0,00    0,00    0,00
>>>> 0,00  138,61      0,00
>>>> 09:28:18      15    0,00    0,00    0,00    1,98    0,00    0,00
>>>> 0,00   99,01      0,00
>>>>
>>>> 09:31:44     CPU   %user   %nice    %sys %iowait    %irq   %soft
>>>> %steal   %idle    intr/s
>>>> 09:31:45     all    1,10    0,00    2,88   11,22    0,00    0,25
>>>> 0,00   84,55   1811,76
>>>> 09:31:45       0    6,86    0,00   13,73   69,61    0,00    3,92
>>>> 0,00    5,88   1810,78
>>>> 09:31:45       1    0,98    0,00    2,94    2,94    0,00    0,00
>>>> 0,00   96,08      0,00
>>>> 09:31:45       2    0,98    0,00    1,96    9,80    0,00    0,00
>>>> 0,00   90,20      0,00
>>>> 09:31:45       3    0,00    0,00    1,96    1,96    0,00    0,00
>>>> 0,00   94,12      0,00
>>>> 09:31:45       4    0,98    0,00    0,00    0,00    0,00    0,00
>>>> 0,00   99,02      0,00
>>>> 09:31:45       5    0,00    0,00    0,98    0,98    0,00    0,00
>>>> 0,00   97,06      0,00
>>>> 09:31:45       6    0,00    0,00    2,94    4,90    0,00    0,00
>>>> 0,00   95,10      0,00
>>>> 09:31:45       7    0,00    0,00    1,96    9,80    0,00    0,00
>>>> 0,00   86,27      0,00
>>>> 09:31:45       8    1,96    0,00    5,88   50,00    0,00    0,00
>>>> 0,00   41,18      0,00
>>>> 09:31:45       9    1,96    0,00    0,98    0,98    0,00    0,00
>>>> 0,00   92,16      0,00
>>>> 09:31:45      10    0,98    0,00    2,94    8,82    0,00    0,00
>>>> 0,00   84,31      0,00
>>>> 09:31:45      11    2,94    0,00    1,96    1,96    0,00    0,00
>>>> 0,00   94,12      0,00
>>>> 09:31:45      12    0,00    0,00    1,96    0,98    0,00    0,00
>>>> 0,00   97,06      0,00
>>>> 09:31:45      13    0,00    0,00    1,96    0,98    0,00    0,00
>>>> 0,00   94,12      0,00
>>>> 09:31:45      14    0,00    0,00    1,96    7,84    0,00    0,00
>>>> 0,00   95,10      0,00
>>>> 09:31:45      15    0,00    0,00    0,98    7,84    0,00    0,00
>>>> 0,00   93,14      0,00
>>>>
>>>> I don't understand why only one CPU (from the 16) is with 100%
>>>> utilization in the moment of high load average, and why mpstat shows
>>>> that only CPU 0 has almost all interruptions/s. By htop, just CPU 0 is
>>>> in high utilization, and that's strange for me. In taht moment, the
>>>> DS-4500 is normal, shows utilization from my mail host about 7-8 MB/s.
>>>>
>>>> So, how could I do to discover why my server have this bottleneck? Any
>>>> help would be appreciated.
>>>>
>>>> Thank you,
>>>>
>>>> Jeronimo Bezerra
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Ocfs2-users mailing list
>>>> Ocfs2-users at oss.oracle.com
>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>
>>>>          
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>      
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>