[Ocfs2-users] High Load Average - New information
Jerônimo Bezerra
jab at ufba.br
Wed Dec 17 17:40:07 PST 2008
Hello Sunil and all,
I didn't upgrade my kernel yet, but I had a error in server B that could
help us:
(4205,0):ocfs2_delete_inode:860 ERROR: status = -17
(4205,0):ocfs2_query_inode_wipe:751 ERROR: status = -17
(4205,0):ocfs2_delete_inode:860 ERROR: status = -17
(4205,0):ocfs2_query_inode_wipe:751 ERROR: status = -17
(4205,0):ocfs2_delete_inode:860 ERROR: status = -17
(4240,0):ocfs2_query_inode_wipe:744 ERROR: Inode 150165660 (on-disk
150165660) not orphaned! Disk flags 0x0, inode flags 0x80
(4240,0):ocfs2_delete_inode:860 ERROR: status = -17
(4868,0):ocfs2_query_inode_wipe:744 ERROR: Inode 15173219 (on-disk
15173219) not orphaned! Disk flags 0x0, inode flags 0x80
(4868,0):ocfs2_delete_inode:860 ERROR: status = -17
(4905,0):ocfs2_query_inode_wipe:744 ERROR: Inode 267696909 (on-disk
267696909) not orphaned! Disk flags 0x0, inode flags 0x80
(4905,0):ocfs2_delete_inode:860 ERROR: status = -17
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at fs/ocfs2/journal.h:441
invalid opcode: 0000 [1] SMP
CPU 0
Modules linked in: ocfs2 ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager
configfs qla2xxx reiserfs dm_snapshot dm_mirror dm_mod loop joydev
serio_raw tsdev psmouse evdev pcspkr shpchp floppy pci_hotplug sg ext3
jbd mbcache ide_cd cdrom usbhid piix sd_mod generic ehci_hcd ide_core
uhci_hcd firmware_class scsi_transport_fc megaraid_mbox scsi_mod
megaraid_mm tg3 thermal processor fan
Pid: 5448, comm: imapd Not tainted 2.6.18-4-amd64 #1
RIP: 0010:[<ffffffff88279360>] [<ffffffff88279360>]
:ocfs2:ocfs2_commit_truncate+0x550/0x1537
RSP: 0018:ffff8101396e5c58 EFLAGS: 00010297
RAX: 0000000000000000 RBX: ffff8102254020c0 RCX: 0000000000000002
RDX: 0000000000f30000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 00000000ffffffff R09: 00000000007cd6d3
R10: ffff81022567a800 R11: ffffffff8828f423 R12: 0000000000000000
R13: ffff81017fb2c000 R14: 0000000007cd6d30 R15: ffff8100ca4c04c8
FS: 00002ac4e7237250(0000) GS:ffffffff80521000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000065c378 CR3: 0000000082e9e000 CR4: 00000000000006e0
Process imapd (pid: 5448, threadinfo ffff8101396e4000, task
ffff8100122e20c0)
Stack: ffff81017a8aedc0 ffff81003030a7f0 ffff81022567a800 ffff810227fd3b88
ffff8100ca4c0408 0000000010360648 ffff810000000000 ffff810142ff65b0
0000000000000000 ffff81017fb2c000 ffff81017fb2c0c0 ffff8100a2ae9f00
Call Trace:
[<ffffffff8828d1d6>] :ocfs2:ocfs2_wipe_inode+0x466/0xb23
[<ffffffff882a91bc>] :ocfs2:ocfs2_delete_response_cb+0x0/0x17f
[<ffffffff88290122>] :ocfs2:ocfs2_delete_inode+0x623/0x7b1
[<ffffffff8828faff>] :ocfs2:ocfs2_delete_inode+0x0/0x7b1
[<ffffffff8022d395>] generic_delete_inode+0xc6/0x143
[<ffffffff8828f53a>] :ocfs2:ocfs2_drop_inode+0x117/0x16e
[<ffffffff8023a1b0>] do_unlinkat+0xd5/0x148
[<ffffffff802584d6>] system_call+0x7e/0x83
Code: 0f 0b 68 f6 d6 2a 88 c2 b9 01 66 85 d2 0f 95 c2 66 ff ce 0f
RIP [<ffffffff88279360>] :ocfs2:ocfs2_commit_truncate+0x550/0x1537
RSP <ffff8101396e5c58>
And another maybe useful information:
a database server in MS Windows is too slow to write on disk, as my
server A (it started in the same day). They both are on same IBM Storage
subsystem and same brocade switch.
Could It help? I know that my kernel is old, but...
Thanks
Jeronimo
Sunil Mushran escreveu:
> There is no ocfs2 1.4 for non-enterprise kernels. For all non-ent
> distros, ocfs2 is part of the kernel. Read the 1.4 user's guide.
> It explains the development process.
>
> You will have to upgrade both node. Make sure they are both
> running the same kernel/ocfs2.
>
> Jeronimo Bezerra wrote:
>> It seems that my only option is upgrade my kernel package..
>>
>> I only find 2.6.24 in this package: linux-image-2.6-amd64-etchnhalf
>> . I will study it better.
>>
>> Well, if I intent to upgrade, what´s your suggestion: upgrade in the
>> good server (B) or in problematic server (A)? Any chance of a file
>> system crash?
>>
>> My ocfs2-tools: 1.2.1-1.3
>>
>> I didn´t find 1.4 on debian apt.
>>
>> Thanks,
>>
>> Jeronimo
>>
>> Citando Sunil Mushran <sunil.mushran at oracle.com>:
>>
>>
>>> Debian etch is 2.6.24 based.
>>>
>>> Jeronimo Bezerra wrote:
>>>
>>>> Hi Sunil, thanks for your answer.
>>>>
>>>> I use packages from Debian apt, and there is not new version of
>>>> kernel package :(. And I intend in this moment only solve this
>>>> problem to turn on my server again. What could I do? Is there
>>>> anything in this moment I can do?
>>>>
>>>> Another question: Can I upgrade my kernel just overwriting the
>>>> actual image? Is the any chance for crash my ocfs2 file system?
>>>> Can I have two server with different kernel versions?
>>>>
>>>> Thanks for your attention,
>>>>
>>>> Jeronimo
>>>>
>>>> Sunil Mushran escreveu:
>>>>
>>>>
>>>>> 2.6.18 is a very old release. I would recommend upgrading to kernel
>>>>> 2.6.21 or later.
>>>>>
>>>>> Jerônimo Bezerra wrote:
>>>>>
>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>> I have a scenario here with two Debian 4.0 servers, kernel
>>>>>> 2.6.18-4-amd64, and ocfs2-tools 1.2.1-1.3.
>>>>>> These two servers have 16 CPU (4 x Dual Core x HT) and 8GB RAM,
>>>>>> with shared storage with qla2340 in a IBM DS4500 Storage.
>>>>>>
>>>>>> Everything was working fine until yesterday at morning, when
>>>>>> for some unknown reason, the load average of both servers
>>>>>> became too high, almost 200. CPU utilization, on both, was
>>>>>> 16-18%, and memory using 7GB, uptime of 22 days. Disk I/0 using
>>>>>> at least 3 MB/s. Pings to crossover interface (heartbeat)
>>>>>> normally, no packet loss.
>>>>>>
>>>>>> I use these servers as a mail server, and nobody could connect
>>>>>> to servers because (I think) the high load average.
>>>>>>
>>>>>> Well, I reboot both servers, and after boot, same thing: in
>>>>>> question of minutes the load average was 150. But one
>>>>>> interesting thing:
>>>>>> when I shutdown the server A, the server B worked fine! If I
>>>>>> turn on server A and shutdown server B, high load average on A.
>>>>>> So, as I shutdown the server A and the things gone fine, I keep
>>>>>> the server A down for 8 hours. At afternoon, I turned on again,
>>>>>> and, surprise, high load on both servers when OCFS2 started. I
>>>>>> had to shutdown both servers and turn on just server B to
>>>>>> established again. At night, I turned on the server A to try to
>>>>>> discovery what's going on. I let both servers turned on all
>>>>>> night ( server A with no service and server B working
>>>>>> normally), and when I arrived at morning today, another
>>>>>> surprise: the load average of server B was on 1200(!) and
>>>>>> server A 0 (no service running).
>>>>>>
>>>>>> When I started services on server A and shutdown server B, the
>>>>>> load on server A became 200 in question of seconds.
>>>>>>
>>>>>> I again shutdown the server A, and after that, turned on server
>>>>>> B. Now everything is working fine, load average of 3 on server B.
>>>>>>
>>>>>> I didn't update the kernel, Debian, storage or anything else.
>>>>>> There's no message on syslog, dmesg or screen. There's no
>>>>>> process with more then 2% of CPU or memory. I really don't know
>>>>>> what to do and I have no clues.
>>>>>>
>>>>>> Please, could someone help me?
>>>>>>
>>>>>> Thanks a log
>>>>>>
>>>>>> Jeronimo
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Ocfs2-users mailing list
>>>>>> Ocfs2-users at oss.oracle.com
>>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>>
>>>>>>
>>>> _______________________________________________
>>>> Ocfs2-users mailing list
>>>> Ocfs2-users at oss.oracle.com
>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>
>>>>
>> ______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
>
More information about the Ocfs2-users
mailing list