[Ocfs2-users] High Load Average

Sunil Mushran sunil.mushran at oracle.com
Tue Dec 16 15:56:20 PST 2008


Debian etch is 2.6.24 based.

Jeronimo Bezerra wrote:
> Hi Sunil, thanks for your answer.
>
> I use packages from Debian apt, and there is not new version of kernel 
> package :(. And I intend in this moment only solve this problem to turn 
> on my server again. What could I do? Is there anything in this moment I 
> can do?
>
> Another question: Can I upgrade my kernel just overwriting the actual 
> image? Is the any chance for crash my ocfs2 file system? Can I have two 
> server with different kernel versions?
>
> Thanks for your attention,
>
> Jeronimo
>
> Sunil Mushran escreveu:
>   
>> 2.6.18 is a very old release. I would recommend upgrading to kernel
>> 2.6.21 or later.
>>
>> Jerônimo Bezerra wrote:
>>     
>>> Hello all,
>>>
>>> I have a scenario here with two Debian 4.0 servers, kernel 
>>> 2.6.18-4-amd64, and ocfs2-tools 1.2.1-1.3.
>>> These two servers have 16 CPU (4 x Dual Core x HT) and 8GB RAM, with 
>>> shared storage with qla2340 in a IBM DS4500 Storage.
>>>
>>> Everything was working fine until yesterday at morning, when for some 
>>> unknown reason, the load average of both servers became too high, 
>>> almost 200. CPU utilization, on both, was 16-18%, and memory using 
>>> 7GB, uptime of 22 days. Disk I/0 using at least 3 MB/s. Pings to 
>>> crossover interface (heartbeat) normally, no packet loss.
>>>
>>> I use these servers as a mail server, and nobody could connect to 
>>> servers because (I think) the high load average.
>>>
>>> Well, I reboot both servers, and after boot, same thing: in question 
>>> of minutes the load average was 150. But one interesting thing:
>>> when I shutdown the server A, the server B worked fine! If I turn on 
>>> server A and shutdown server B, high load average on A. So, as I 
>>> shutdown the server A and the things gone fine, I keep the server A 
>>> down for 8 hours. At afternoon, I turned on again, and, surprise, 
>>> high load on both servers when OCFS2 started. I had to shutdown both 
>>> servers and turn on just server B to established again. At night, I 
>>> turned on the server A to try to discovery what's going on. I let 
>>> both servers turned on all night ( server A with no service and 
>>> server B working normally), and when I arrived at morning today, 
>>> another surprise: the load average of server B was on 1200(!) and 
>>> server A 0 (no service running).
>>>
>>> When I started services on server A and shutdown server B, the load 
>>> on server A became 200 in question of seconds.
>>>
>>> I again shutdown the server A, and after that, turned on server B. 
>>> Now everything is working fine, load average of 3 on server B.
>>>
>>> I didn't update the kernel, Debian, storage or anything else. There's 
>>> no message on syslog, dmesg or screen. There's no process with more 
>>> then 2% of CPU or memory. I really don't know what to do and I have 
>>> no clues.
>>>
>>> Please, could someone help me?
>>>
>>> Thanks a log
>>>
>>> Jeronimo
>>>
>>>
>>>
>>> _______________________________________________
>>> Ocfs2-users mailing list
>>> Ocfs2-users at oss.oracle.com
>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>   
>>>       
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>   




More information about the Ocfs2-users mailing list