[Ocfs2-users] High Load Average - New information

Sunil Mushran sunil.mushran at oracle.com
Wed Dec 17 18:08:14 PST 2008


Not really. It is another issue that could be related to the fact
that the fs is very old.

Jerônimo Bezerra wrote:
> Hello Sunil and all,
>
> I didn't upgrade my kernel yet, but I had a error in server B that could 
> help us:
>
> (4205,0):ocfs2_delete_inode:860 ERROR: status = -17
> (4205,0):ocfs2_query_inode_wipe:751 ERROR: status = -17
> (4205,0):ocfs2_delete_inode:860 ERROR: status = -17
> (4205,0):ocfs2_query_inode_wipe:751 ERROR: status = -17
> (4205,0):ocfs2_delete_inode:860 ERROR: status = -17
> (4240,0):ocfs2_query_inode_wipe:744 ERROR: Inode 150165660 (on-disk 
> 150165660) not orphaned! Disk flags  0x0, inode flags 0x80
> (4240,0):ocfs2_delete_inode:860 ERROR: status = -17
> (4868,0):ocfs2_query_inode_wipe:744 ERROR: Inode 15173219 (on-disk 
> 15173219) not orphaned! Disk flags  0x0, inode flags 0x80
> (4868,0):ocfs2_delete_inode:860 ERROR: status = -17
> (4905,0):ocfs2_query_inode_wipe:744 ERROR: Inode 267696909 (on-disk 
> 267696909) not orphaned! Disk flags  0x0, inode flags 0x80
> (4905,0):ocfs2_delete_inode:860 ERROR: status = -17
> ----------- [cut here ] --------- [please bite here ] ---------
> Kernel BUG at fs/ocfs2/journal.h:441
> invalid opcode: 0000 [1] SMP
> CPU 0
> Modules linked in: ocfs2 ocfs2_dlmfs ocfs2_dlm ocfs2_nodemanager 
> configfs qla2xxx reiserfs dm_snapshot dm_mirror dm_mod loop joydev 
> serio_raw tsdev psmouse evdev pcspkr shpchp floppy pci_hotplug sg ext3 
> jbd mbcache ide_cd cdrom usbhid piix sd_mod generic ehci_hcd ide_core 
> uhci_hcd firmware_class scsi_transport_fc megaraid_mbox scsi_mod 
> megaraid_mm tg3 thermal processor fan
> Pid: 5448, comm: imapd Not tainted 2.6.18-4-amd64 #1
> RIP: 0010:[<ffffffff88279360>]  [<ffffffff88279360>] 
> :ocfs2:ocfs2_commit_truncate+0x550/0x1537
> RSP: 0018:ffff8101396e5c58  EFLAGS: 00010297
> RAX: 0000000000000000 RBX: ffff8102254020c0 RCX: 0000000000000002
> RDX: 0000000000f30000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: 0000000000000000 R08: 00000000ffffffff R09: 00000000007cd6d3
> R10: ffff81022567a800 R11: ffffffff8828f423 R12: 0000000000000000
> R13: ffff81017fb2c000 R14: 0000000007cd6d30 R15: ffff8100ca4c04c8
> FS:  00002ac4e7237250(0000) GS:ffffffff80521000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 000000000065c378 CR3: 0000000082e9e000 CR4: 00000000000006e0
> Process imapd (pid: 5448, threadinfo ffff8101396e4000, task 
> ffff8100122e20c0)
> Stack:  ffff81017a8aedc0 ffff81003030a7f0 ffff81022567a800 ffff810227fd3b88
>  ffff8100ca4c0408 0000000010360648 ffff810000000000 ffff810142ff65b0
>  0000000000000000 ffff81017fb2c000 ffff81017fb2c0c0 ffff8100a2ae9f00
> Call Trace:
>  [<ffffffff8828d1d6>] :ocfs2:ocfs2_wipe_inode+0x466/0xb23
>  [<ffffffff882a91bc>] :ocfs2:ocfs2_delete_response_cb+0x0/0x17f
>  [<ffffffff88290122>] :ocfs2:ocfs2_delete_inode+0x623/0x7b1
>  [<ffffffff8828faff>] :ocfs2:ocfs2_delete_inode+0x0/0x7b1
>  [<ffffffff8022d395>] generic_delete_inode+0xc6/0x143
>  [<ffffffff8828f53a>] :ocfs2:ocfs2_drop_inode+0x117/0x16e
>  [<ffffffff8023a1b0>] do_unlinkat+0xd5/0x148
>  [<ffffffff802584d6>] system_call+0x7e/0x83
>
>
> Code: 0f 0b 68 f6 d6 2a 88 c2 b9 01 66 85 d2 0f 95 c2 66 ff ce 0f
> RIP  [<ffffffff88279360>] :ocfs2:ocfs2_commit_truncate+0x550/0x1537
>  RSP <ffff8101396e5c58>
>
>
> And another maybe useful information:
>
> a database server in MS Windows is too slow to write on disk, as my 
> server A (it started in the same day). They both are on same IBM Storage 
> subsystem and same brocade switch.
>
> Could It help? I know that my kernel is old, but...
>
> Thanks
>
> Jeronimo
>
>
> Sunil Mushran escreveu:
>   
>> There is no ocfs2 1.4 for non-enterprise kernels. For all non-ent
>> distros, ocfs2 is part of the kernel. Read the 1.4 user's guide.
>> It explains the development process.
>>
>> You will have to upgrade both node. Make sure they are both
>> running the same kernel/ocfs2.
>>
>> Jeronimo Bezerra wrote:
>>     
>>> It seems that my only option is upgrade my kernel package..
>>>
>>> I only find 2.6.24 in this package: linux-image-2.6-amd64-etchnhalf 
>>> .  I will study it better.
>>>
>>> Well, if I intent to upgrade, what´s your suggestion: upgrade in the  
>>> good server (B) or in problematic server (A)? Any chance of a file  
>>> system crash?
>>>
>>> My ocfs2-tools: 1.2.1-1.3
>>>
>>> I didn´t find 1.4 on debian apt.
>>>
>>> Thanks,
>>>
>>> Jeronimo
>>>
>>> Citando Sunil Mushran <sunil.mushran at oracle.com>:
>>>
>>>  
>>>       
>>>> Debian etch is 2.6.24 based.
>>>>
>>>> Jeronimo Bezerra wrote:
>>>>    
>>>>         
>>>>> Hi Sunil, thanks for your answer.
>>>>>
>>>>> I use packages from Debian apt, and there is not new version of   
>>>>> kernel package :(. And I intend in this moment only solve this   
>>>>> problem to turn on my server again. What could I do? Is there   
>>>>> anything in this moment I can do?
>>>>>
>>>>> Another question: Can I upgrade my kernel just overwriting the   
>>>>> actual image? Is the any chance for crash my ocfs2 file system? 
>>>>> Can   I have two server with different kernel versions?
>>>>>
>>>>> Thanks for your attention,
>>>>>
>>>>> Jeronimo
>>>>>
>>>>> Sunil Mushran escreveu:
>>>>>
>>>>>      
>>>>>           
>>>>>> 2.6.18 is a very old release. I would recommend upgrading to kernel
>>>>>> 2.6.21 or later.
>>>>>>
>>>>>> Jerônimo Bezerra wrote:
>>>>>>
>>>>>>        
>>>>>>             
>>>>>>> Hello all,
>>>>>>>
>>>>>>> I have a scenario here with two Debian 4.0 servers, kernel   
>>>>>>> 2.6.18-4-amd64, and ocfs2-tools 1.2.1-1.3.
>>>>>>> These two servers have 16 CPU (4 x Dual Core x HT) and 8GB RAM,   
>>>>>>> with shared storage with qla2340 in a IBM DS4500 Storage.
>>>>>>>
>>>>>>> Everything was working fine until yesterday at morning, when 
>>>>>>> for   some unknown reason, the load average of both servers 
>>>>>>> became too   high, almost 200. CPU utilization, on both, was 
>>>>>>> 16-18%, and   memory using 7GB, uptime of 22 days. Disk I/0 using 
>>>>>>> at least 3   MB/s. Pings to crossover interface (heartbeat) 
>>>>>>> normally, no   packet loss.
>>>>>>>
>>>>>>> I use these servers as a mail server, and nobody could connect 
>>>>>>> to   servers because (I think) the high load average.
>>>>>>>
>>>>>>> Well, I reboot both servers, and after boot, same thing: in   
>>>>>>> question of minutes the load average was 150. But one 
>>>>>>> interesting   thing:
>>>>>>> when I shutdown the server A, the server B worked fine! If I 
>>>>>>> turn   on server A and shutdown server B, high load average on A. 
>>>>>>> So,  as  I shutdown the server A and the things gone fine, I keep 
>>>>>>> the   server A down for 8 hours. At afternoon, I turned on again, 
>>>>>>> and,   surprise, high load on both servers when OCFS2 started. I 
>>>>>>> had to   shutdown both servers and turn on just server B to 
>>>>>>> established   again. At night, I turned on the server A to try to 
>>>>>>> discovery   what's going on. I let both servers turned on all 
>>>>>>> night ( server   A with no service and server B working 
>>>>>>> normally), and when I   arrived at morning today, another 
>>>>>>> surprise: the load average of   server B was on 1200(!) and 
>>>>>>> server A 0 (no service running).
>>>>>>>
>>>>>>> When I started services on server A and shutdown server B, the   
>>>>>>> load on server A became 200 in question of seconds.
>>>>>>>
>>>>>>> I again shutdown the server A, and after that, turned on server   
>>>>>>> B. Now everything is working fine, load average of 3 on server B.
>>>>>>>
>>>>>>> I didn't update the kernel, Debian, storage or anything else.   
>>>>>>> There's no message on syslog, dmesg or screen. There's no 
>>>>>>> process   with more then 2% of CPU or memory. I really don't know 
>>>>>>> what to   do and I have no clues.
>>>>>>>
>>>>>>> Please, could someone help me?
>>>>>>>
>>>>>>> Thanks a log
>>>>>>>
>>>>>>> Jeronimo
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Ocfs2-users mailing list
>>>>>>> Ocfs2-users at oss.oracle.com
>>>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>>>
>>>>>>>           
>>>>>>>               
>>>>> _______________________________________________
>>>>> Ocfs2-users mailing list
>>>>> Ocfs2-users at oss.oracle.com
>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>
>>>>>      
>>>>>           
>>> ______________________________________________
>>> Ocfs2-users mailing list
>>> Ocfs2-users at oss.oracle.com
>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>   
>>>       
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>   




More information about the Ocfs2-users mailing list