[Ocfs2-users] Extremely poor write performance, but read appears to be okay

Thu Dec 9 20:56:28 PST 2010

Thank you. I'll investigate as soon as I can get decent access.
Rebooting nodes seemed to have temporarily solved the issue, however,
things are not completely back to normal yet.
-Daniel

On Thu, Dec 9, 2010 at 8:49 AM, Sunil Mushran <sunil.mushran at oracle.com> wrote:
> http://oss.oracle.com/git/?p=ocfs2-1.4.git;a=commitdiff;h=1f667766cb67ed05b4d706aa82e8ad0b12eaae8b
>
> That specific error has been addressed in the upcoming 1.4.8.
>
> Attach the logs and all other info to a bugzilla.
>
> On 12/08/2010 05:07 PM, Daniel McDonald wrote:
>>
>> Hello,
>>
>> I'm writing from the otherside of the world from where my systems are,
>> so details are coming in slow. We have a 6TB OCFS2 volume across 20 or
>> so nodes all running OEL5.4 running ocfs2-1.4.4. The system has worked
>> fairly well for the last 6-8 months. Something has happened over the
>> last few weeks which has driven write performance nearly to a halt.
>> I'm not sure how to proceed, and very poor internet is hindering my
>> abilities further. I've verified that the disk array is in good
>> health. I'm seeing a few awkward kernel log messages, an example of
>> one follows. I have not been able to verify all nodes due to limited
>> time and slow internet in my present location. Any assistance would be
>> greatly appreciated. I should be able to provide log files in about 12
>> hours. At this moment, loadavgs on each node are 0.00 to 0.09.
>>
>> Here is a test write and associated iostat -xm 5 output. Previously I
>> was obtaining>  90MB/s:
>>
>> $ dd if=/dev/zero of=/home/testdump count=1000 bs=1024k
>>
>> ...and associated iostat output:
>>
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>            0.10    0.00    0.43   12.25    0.00   87.22
>>
>> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz
>> avgqu-sz   await  svctm  %util
>> sda               0.00     1.80  0.00  8.40     0.00     0.04     9.71
>>     0.01    0.64   0.05   0.04
>> sda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00
>>     0.00    0.00   0.00   0.00
>> sda2              0.00     0.00  0.00  0.00     0.00     0.00     0.00
>>     0.00    0.00   0.00   0.00
>> sda3              0.00     1.80  0.00  8.40     0.00     0.04     9.71
>>     0.01    0.64   0.05   0.04
>> sdc               0.00     0.00 115.80  0.60     0.46     0.00
>> 8.04     0.99    8.48   8.47  98.54
>> sdc1              0.00     0.00 115.80  0.60     0.46     0.00
>> 8.04     0.99    8.48   8.47  98.54
>>
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>            0.07    0.00    0.55   12.25    0.00   87.13
>>
>> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz
>> avgqu-sz   await  svctm  %util
>> sda               0.00     0.40  0.00  0.80     0.00     0.00    12.00
>>     0.00    2.00   1.25   0.10
>> sda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00
>>     0.00    0.00   0.00   0.00
>> sda2              0.00     0.00  0.00  0.00     0.00     0.00     0.00
>>     0.00    0.00   0.00   0.00
>> sda3              0.00     0.40  0.00  0.80     0.00     0.00    12.00
>>     0.00    2.00   1.25   0.10
>> sdc               0.00     0.00 112.80  0.40     0.44     0.00
>> 8.03     0.98    8.68   8.69  98.38
>> sdc1              0.00     0.00 112.80  0.40     0.44     0.00
>> 8.03     0.98    8.68   8.69  98.38
>>
>> Here is a test read and associated iostat output. I'm intentionally
>> reading from a different test file as to avoid caching effects:
>>
>> $ dd if=/home/someothertestdump of=/dev/null bs=1024k
>>
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>            0.10    0.00    3.60   10.85    0.00   85.45
>>
>> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz
>> avgqu-sz   await  svctm  %util
>> sda               0.00     3.79  0.00  1.40     0.00     0.02    29.71
>>     0.00    1.29   0.43   0.06
>> sda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00
>>     0.00    0.00   0.00   0.00
>> sda2              0.00     0.00  0.00  0.00     0.00     0.00     0.00
>>     0.00    0.00   0.00   0.00
>> sda3              0.00     3.79  0.00  1.40     0.00     0.02    29.71
>>     0.00    1.29   0.43   0.06
>> sdc               7.98     0.20 813.17  1.00   102.50     0.00
>> 257.84     1.92    2.34   1.19  96.71
>> sdc1              7.98     0.20 813.17  1.00   102.50     0.00
>> 257.84     1.92    2.34   1.19  96.67
>>
>> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>>            0.07    0.00    3.67   10.22    0.00   86.03
>>
>> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz
>> avgqu-sz   await  svctm  %util
>> sda               0.00     0.20  0.00  0.40     0.00     0.00    12.00
>>     0.00    0.50   0.50   0.02
>> sda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00
>>     0.00    0.00   0.00   0.00
>> sda2              0.00     0.00  0.00  0.00     0.00     0.00     0.00
>>     0.00    0.00   0.00   0.00
>> sda3              0.00     0.20  0.00  0.40     0.00     0.00    12.00
>>     0.00    0.50   0.50   0.02
>> sdc               6.60     0.20 829.00  1.00   104.28     0.00
>> 257.32     1.90    2.31   1.17  97.28
>> sdc1              6.60     0.20 829.00  1.00   104.28     0.00
>> 257.32     1.90    2.31   1.17  97.28
>>
>> I'm seeing a few weird kernel messages, such as:
>>
>> Dec  7 14:07:50 growler kernel:
>> (dlm_wq,4793,4):dlm_deref_lockres_worker:2344 ERROR:
>> 84B7C6421A6C4280AB87F569035C5368:O0000000000000016296ce900000000: node
>> 14 trying to drop ref but it is already dropped!
>> Dec  7 14:07:50 growler kernel: lockres:
>> O0000000000000016296ce900000000, owner=0, state=0
>> Dec  7 14:07:50 growler kernel:   last used: 0, refcnt: 6, on purge list:
>> no
>> Dec  7 14:07:50 growler kernel:   on dirty list: no, on reco list: no,
>> migrating pending: no
>> Dec  7 14:07:50 growler kernel:   inflight locks: 0, asts reserved: 0
>> Dec  7 14:07:50 growler kernel:   refmap nodes: [ 21 ], inflight=0
>> Dec  7 14:07:50 growler kernel:   granted queue:
>> Dec  7 14:07:50 growler kernel:     type=3, conv=-1, node=21,
>> cookie=21:213370, ref=2, ast=(empty=y,pend=n), bast=(empty=y,pend=n),
>> pending=(conv=n,lock=n,cancel=n,unlock=n)
>> Dec  7 14:07:50 growler kernel:   converting queue:
>> Dec  7 14:07:50 growler kernel:   blocked queue:
>>
>>
>> Here is df output:
>>
>> root at growler:~$ df
>> Filesystem           1K-blocks      Used Available Use% Mounted on
>> /dev/sda3            245695888  29469416 203544360  13% /
>> /dev/sda1               101086     15133     80734  16% /boot
>> tmpfs                 33005580         0  33005580   0% /dev/shm
>> /dev/sdc1            5857428444 5234400436 623028008  90% /home
>>
>>  Thanks
>> -Daniel
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>
>