[Ocfs2-users] Weird lock

Nuno Fernandes npf-mlists at eurotux.com
Thu Apr 10 06:36:40 PDT 2008


On Thursday 10 April 2008 11:13:06 Erik Terpstra wrote:
> We have a similar situation, our system hangs several times a day.
> I still can't figure out exactly what's going wrong.
> But on 1 node of our system (where Apache runs + a webservice written in
> Ruby (Mongrel/Camping)), the system load keeps rising until it is not
> responding to anything.
> Also in the process list there are a lot of processes in D state at that
> time.
That's because thay are waiting for something that the kernel will provide  
(probably a lock). Please do:

echo t > /proc/sysrq-trigger

Check /var/log/messages and paste it here.

> The weird thing is that we just discovered that rebooting *another* node
> (we have 4 in total) fixes this situation.
The "another" node is also with high load? Is there any program running at 
100% CPU?

If yes, i think that the first node is waiting for a lock that the "another" 
node has. Until that lock is released all processes remain in "D" state and 
the load keeps on rising.

Also do:

echo t > /proc/sysrq-trigger

Check /var/log/messages and paste it here.

Do a:

ps fax

also and paste it here.

I also have that problem and i'm still trying to figure it out.

Best regards,
./npf
> Suddenly the system load on the node that initially had the problem
> returns to a normal level and the processes that were in a D state are
> also returning to their normal states.
> Any idea why rebooting another node results fixes this situation? And
> what might be the cause of this?
>
> We are running:
>
> Linux test01 2.6.22-14-server #1 SMP Thu Jan 31 23:57:25 UTC 2008 x86_64
> GNU/Linux
>
> [   77.688875] OCFS2 Node Manager 1.3.3
> [   77.703166] OCFS2 DLM 1.3.3
> [   77.710731] OCFS2 DLMFS 1.3.3
> [   77.710816] OCFS2 User DLM kernel interface loaded
> [   85.870956] OCFS2 1.3.3
>
> Kind regards,
>
> Erik.
>
> > Hello,
> >
> > yes.. when this situation happens there is allways a process spinning
> > (running at 100%cpu). We can't kill it even with kill -9
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users





More information about the Ocfs2-users mailing list