[Ocfs2-users] High Load Average

Jerônimo Bezerra jab at ufba.br
Tue Dec 16 06:14:15 PST 2008


Hello all,

I have a scenario here with two Debian 4.0 servers, kernel 
2.6.18-4-amd64, and ocfs2-tools 1.2.1-1.3.
These two servers have 16 CPU (4 x Dual Core x HT) and 8GB RAM, with 
shared storage with qla2340 in a IBM DS4500 Storage.

Everything was working fine until yesterday at morning, when for some 
unknown reason, the load average of both servers became too high, almost 
200. CPU utilization, on both, was 16-18%, and memory using 7GB, uptime 
of 22 days. Disk I/0 using at least 3 MB/s. Pings to crossover interface 
(heartbeat) normally, no packet loss.

I use these servers as a mail server, and nobody could connect to 
servers because (I think) the high load average.

Well, I reboot both servers, and after boot, same thing: in question of 
minutes the load average was 150. But one interesting thing:
when I shutdown the server A, the server B worked fine! If I turn on 
server A and shutdown server B, high load average on A. So, as I 
shutdown the server A and the things gone fine, I keep the server A down 
for 8 hours. At afternoon, I turned on again, and, surprise, high load 
on both servers when OCFS2 started. I had to shutdown both servers and 
turn on just server B to established again. At night, I turned on the 
server A to try to discovery what's going on. I let both servers turned 
on all night ( server A with no service and server B working normally), 
and when I arrived at morning today, another surprise: the load average 
of server B was on 1200(!) and server A 0 (no service running).

When I started services on server A and shutdown server B, the load on 
server A became 200 in question of seconds.

I again shutdown the server A, and after that, turned on server B. Now 
everything is working fine, load average of 3 on server B.

I didn't update the kernel, Debian, storage or anything else. There's no 
message on syslog, dmesg or screen. There's no process with more then 2% 
of CPU or memory. I really don't know what to do and I have no clues.

Please, could someone help me?

Thanks a log

Jeronimo





More information about the Ocfs2-users mailing list