[Ocfs2-users] Nodes losing connection

Marc Kowal Marc.Kowal at gmx.de
Tue Apr 5 00:20:22 PDT 2011


  Hi all,

we are currently running a three node Moodle/Apache cluster with OCFS2
as upload directory. Everything is fine, but sometimes some nodes losing
connections.

I get the following error on Node 2

kernel: [555631.411454] o2net: connection to node node-03 (num 2) at
xxx.196.20.20:7777
has been idle for 7.0 seconds, shutting it down.

kernel: [555631.411482] (19959,0):o2net_idle_timer:1495 here are some
times that
might help debug the situation: (tmr 1301847991.990535 now
1301847998.990086 dr 1301847991.990489
adv 1301847991.990536:1301847991.990537 func (d672c340:502)
1301847983.930438:1301847983.930444)

after that Apache is going down and forces some kernel errors.

and Node 3:

kernel: [555392.301334] o2net: no longer connected to node node-02 (num
1) at xxx.196.20.9:7777

and is trying to reconnect FOR HOURS...

and also here Apache is going down causing the cluster to stuck. I'm not
able to stop ocfs2 nor o2cb

All nodes are running:
Debian Squeeze, 2.6.32-5-amd64 on a VMWare ESX Virtual Machine

If you need any further information please let me know. Thanks for all
help i'll get

regards

Marc










More information about the Ocfs2-users mailing list