[Ocfs2-users] High Load Average - New information

Wed Dec 17 17:40:07 PST 2008

Hello Sunil and all,

I didn't upgrade my kernel yet, but I had a error in server B that could 
help us:

(4205,0):ocfs2_delete_inode:860 ERROR: status = -17
(4205,0):ocfs2_query_inode_wipe:751 ERROR: status = -17
(4205,0):ocfs2_delete_inode:860 ERROR: status = -17
(4205,0):ocfs2_query_inode_wipe:751 ERROR: status = -17
(4205,0):ocfs2_delete_inode:860 ERROR: status = -17
(4240,0):ocfs2_query_inode_wipe:744 ERROR: Inode 150165660 (on-disk 
150165660) not orphaned! Disk flags  0x0, inode flags 0x80
(4240,0):ocfs2_delete_inode:860 ERROR: status = -17
(4868,0):ocfs2_query_inode_wipe:744 ERROR: Inode 15173219 (on-disk 
15173219) not orphaned! Disk flags  0x0, inode flags 0x80
(4868,0):ocfs2_delete_inode:860 ERROR: status = -17
(4905,0):ocfs2_query_inode_wipe:744 ERROR: Inode 267696909 (on-disk 
267696909) not orphaned! Disk flags  0x0, inode flags 0x80
(4905,0):ocfs2_delete_inode:860 ERROR: status = -17
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at fs/ocfs2/journal.h:441
invalid opcode: 0000 [1] SMP
CPU 0
Modules linked in: ocfs2 ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager 
configfs qla2xxx reiserfs dm_snapshot dm_mirror dm_mod loop joydev 
serio_raw tsdev psmouse evdev pcspkr shpchp floppy pci_hotplug sg ext3 
jbd mbcache ide_cd cdrom usbhid piix sd_mod generic ehci_hcd ide_core 
uhci_hcd firmware_class scsi_transport_fc megaraid_mbox scsi_mod 
megaraid_mm tg3 thermal processor fan
Pid: 5448, comm: imapd Not tainted 2.6.18-4-amd64 #1
RIP: 0010:[<ffffffff88279360>]  [<ffffffff88279360>] 
:ocfs2:ocfs2_commit_truncate+0x550/0x1537
RSP: 0018:ffff8101396e5c58  EFLAGS: 00010297
RAX: 0000000000000000 RBX: ffff8102254020c0 RCX: 0000000000000002
RDX: 0000000000f30000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 00000000ffffffff R09: 00000000007cd6d3
R10: ffff81022567a800 R11: ffffffff8828f423 R12: 0000000000000000
R13: ffff81017fb2c000 R14: 0000000007cd6d30 R15: ffff8100ca4c04c8
FS:  00002ac4e7237250(0000) GS:ffffffff80521000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000000065c378 CR3: 0000000082e9e000 CR4: 00000000000006e0
Process imapd (pid: 5448, threadinfo ffff8101396e4000, task 
ffff8100122e20c0)
Stack:  ffff81017a8aedc0 ffff81003030a7f0 ffff81022567a800 ffff810227fd3b88
 ffff8100ca4c0408 0000000010360648 ffff810000000000 ffff810142ff65b0
 0000000000000000 ffff81017fb2c000 ffff81017fb2c0c0 ffff8100a2ae9f00
Call Trace:
 [<ffffffff8828d1d6>] :ocfs2:ocfs2_wipe_inode+0x466/0xb23
 [<ffffffff882a91bc>] :ocfs2:ocfs2_delete_response_cb+0x0/0x17f
 [<ffffffff88290122>] :ocfs2:ocfs2_delete_inode+0x623/0x7b1
 [<ffffffff8828faff>] :ocfs2:ocfs2_delete_inode+0x0/0x7b1
 [<ffffffff8022d395>] generic_delete_inode+0xc6/0x143
 [<ffffffff8828f53a>] :ocfs2:ocfs2_drop_inode+0x117/0x16e
 [<ffffffff8023a1b0>] do_unlinkat+0xd5/0x148
 [<ffffffff802584d6>] system_call+0x7e/0x83

Code: 0f 0b 68 f6 d6 2a 88 c2 b9 01 66 85 d2 0f 95 c2 66 ff ce 0f
RIP  [<ffffffff88279360>] :ocfs2:ocfs2_commit_truncate+0x550/0x1537
 RSP <ffff8101396e5c58>

And another maybe useful information:

a database server in MS Windows is too slow to write on disk, as my 
server A (it started in the same day). They both are on same IBM Storage 
subsystem and same brocade switch.

Could It help? I know that my kernel is old, but...

Thanks

Jeronimo

Sunil Mushran escreveu:
> There is no ocfs2 1.4 for non-enterprise kernels. For all non-ent
> distros, ocfs2 is part of the kernel. Read the 1.4 user's guide.
> It explains the development process.
>
> You will have to upgrade both node. Make sure they are both
> running the same kernel/ocfs2.
>
> Jeronimo Bezerra wrote:
>> It seems that my only option is upgrade my kernel package..
>>
>> I only find 2.6.24 in this package: linux-image-2.6-amd64-etchnhalf 
>> .  I will study it better.
>>
>> Well, if I intent to upgrade, what´s your suggestion: upgrade in the  
>> good server (B) or in problematic server (A)? Any chance of a file  
>> system crash?
>>
>> My ocfs2-tools: 1.2.1-1.3
>>
>> I didn´t find 1.4 on debian apt.
>>
>> Thanks,
>>
>> Jeronimo
>>
>> Citando Sunil Mushran <sunil.mushran at oracle.com>:
>>
>>  
>>> Debian etch is 2.6.24 based.
>>>
>>> Jeronimo Bezerra wrote:
>>>    
>>>> Hi Sunil, thanks for your answer.
>>>>
>>>> I use packages from Debian apt, and there is not new version of   
>>>> kernel package :(. And I intend in this moment only solve this   
>>>> problem to turn on my server again. What could I do? Is there   
>>>> anything in this moment I can do?
>>>>
>>>> Another question: Can I upgrade my kernel just overwriting the   
>>>> actual image? Is the any chance for crash my ocfs2 file system? 
>>>> Can   I have two server with different kernel versions?
>>>>
>>>> Thanks for your attention,
>>>>
>>>> Jeronimo
>>>>
>>>> Sunil Mushran escreveu:
>>>>
>>>>      
>>>>> 2.6.18 is a very old release. I would recommend upgrading to kernel
>>>>> 2.6.21 or later.
>>>>>
>>>>> Jerônimo Bezerra wrote:
>>>>>
>>>>>        
>>>>>> Hello all,
>>>>>>
>>>>>> I have a scenario here with two Debian 4.0 servers, kernel   
>>>>>> 2.6.18-4-amd64, and ocfs2-tools 1.2.1-1.3.
>>>>>> These two servers have 16 CPU (4 x Dual Core x HT) and 8GB RAM,   
>>>>>> with shared storage with qla2340 in a IBM DS4500 Storage.
>>>>>>
>>>>>> Everything was working fine until yesterday at morning, when 
>>>>>> for   some unknown reason, the load average of both servers 
>>>>>> became too   high, almost 200. CPU utilization, on both, was 
>>>>>> 16-18%, and   memory using 7GB, uptime of 22 days. Disk I/0 using 
>>>>>> at least 3   MB/s. Pings to crossover interface (heartbeat) 
>>>>>> normally, no   packet loss.
>>>>>>
>>>>>> I use these servers as a mail server, and nobody could connect 
>>>>>> to   servers because (I think) the high load average.
>>>>>>
>>>>>> Well, I reboot both servers, and after boot, same thing: in   
>>>>>> question of minutes the load average was 150. But one 
>>>>>> interesting   thing:
>>>>>> when I shutdown the server A, the server B worked fine! If I 
>>>>>> turn   on server A and shutdown server B, high load average on A. 
>>>>>> So,  as  I shutdown the server A and the things gone fine, I keep 
>>>>>> the   server A down for 8 hours. At afternoon, I turned on again, 
>>>>>> and,   surprise, high load on both servers when OCFS2 started. I 
>>>>>> had to   shutdown both servers and turn on just server B to 
>>>>>> established   again. At night, I turned on the server A to try to 
>>>>>> discovery   what's going on. I let both servers turned on all 
>>>>>> night ( server   A with no service and server B working 
>>>>>> normally), and when I   arrived at morning today, another 
>>>>>> surprise: the load average of   server B was on 1200(!) and 
>>>>>> server A 0 (no service running).
>>>>>>
>>>>>> When I started services on server A and shutdown server B, the   
>>>>>> load on server A became 200 in question of seconds.
>>>>>>
>>>>>> I again shutdown the server A, and after that, turned on server   
>>>>>> B. Now everything is working fine, load average of 3 on server B.
>>>>>>
>>>>>> I didn't update the kernel, Debian, storage or anything else.   
>>>>>> There's no message on syslog, dmesg or screen. There's no 
>>>>>> process   with more then 2% of CPU or memory. I really don't know 
>>>>>> what to   do and I have no clues.
>>>>>>
>>>>>> Please, could someone help me?
>>>>>>
>>>>>> Thanks a log
>>>>>>
>>>>>> Jeronimo
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Ocfs2-users mailing list
>>>>>> Ocfs2-users at oss.oracle.com
>>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>>
>>>>>>           
>>>> _______________________________________________
>>>> Ocfs2-users mailing list
>>>> Ocfs2-users at oss.oracle.com
>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>
>>>>      
>> ______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>   
>