[Ocfs2-users] High Load Average

Brett Worth brett at worth.id.au
Tue Dec 16 16:42:10 PST 2008


If you're running RHEL5 I'd leave the kernel where it is.  RedHat
provide the kernel updates matched to their rhel versions.  OCFS2
provide RPMs for 1.4.1-1 running on 2.6.18-92.1.18.el5 which is the
current rhel kernel.

RedHat will not move from the 2.6.18 kernel until RHEL 6 and provide
patched kernels at the 2.6.18 level which , in theory, should include
all bug fixes but not change the API.

Brett

2008/12/17 Sunil Mushran <sunil.mushran at oracle.com>:
> 2.6.18 is a very old release. I would recommend upgrading to kernel
> 2.6.21 or later.
>
> Jerônimo Bezerra wrote:
>> Hello all,
>>
>> I have a scenario here with two Debian 4.0 servers, kernel
>> 2.6.18-4-amd64, and ocfs2-tools 1.2.1-1.3.
>> These two servers have 16 CPU (4 x Dual Core x HT) and 8GB RAM, with
>> shared storage with qla2340 in a IBM DS4500 Storage.
>>
>> Everything was working fine until yesterday at morning, when for some
>> unknown reason, the load average of both servers became too high, almost
>> 200. CPU utilization, on both, was 16-18%, and memory using 7GB, uptime
>> of 22 days. Disk I/0 using at least 3 MB/s. Pings to crossover interface
>> (heartbeat) normally, no packet loss.
>>
>> I use these servers as a mail server, and nobody could connect to
>> servers because (I think) the high load average.
>>
>> Well, I reboot both servers, and after boot, same thing: in question of
>> minutes the load average was 150. But one interesting thing:
>> when I shutdown the server A, the server B worked fine! If I turn on
>> server A and shutdown server B, high load average on A. So, as I
>> shutdown the server A and the things gone fine, I keep the server A down
>> for 8 hours. At afternoon, I turned on again, and, surprise, high load
>> on both servers when OCFS2 started. I had to shutdown both servers and
>> turn on just server B to established again. At night, I turned on the
>> server A to try to discovery what's going on. I let both servers turned
>> on all night ( server A with no service and server B working normally),
>> and when I arrived at morning today, another surprise: the load average
>> of server B was on 1200(!) and server A 0 (no service running).
>>
>> When I started services on server A and shutdown server B, the load on
>> server A became 200 in question of seconds.
>>
>> I again shutdown the server A, and after that, turned on server B. Now
>> everything is working fine, load average of 3 on server B.
>>
>> I didn't update the kernel, Debian, storage or anything else. There's no
>> message on syslog, dmesg or screen. There's no process with more then 2%
>> of CPU or memory. I really don't know what to do and I have no clues.
>>
>> Please, could someone help me?
>>
>> Thanks a log
>>
>> Jeronimo
>>
>>
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>



-- 
Brett

  /) _ _ _/_/ / / /  _ _//
 /_)/</= / / (_(_/()/< ///



More information about the Ocfs2-users mailing list