[Ocfs2-users] 2 Node cluster crashing

Mark Maiden markm at globoforce.com
Mon Jul 10 06:02:31 CDT 2006


Hi,

We have a two node cluster running SLES 9 SP2 connecting directly to an 
EMC CX300 for storage.

We are using OCFS(OCFS2 DLM 0.99.15-SLES) for the voting disk etc, and 
ASM for data files.

The system has been running until last Friday when the whole cluster 
went down with the following error messages in the /var/log/messages 
files :

rac1:

Jul 7 14:56:23 rac1 kernel: (0,3):o2net_state_change:512 connection to node
rac2.globoforce.com num 1 at 198.87.235.246:7777 has been idle for 10 
seconds,
shutting it down.
Jul 7 14:56:23 rac1 kernel: (10042,0):o2net_set_nn_state:414 no longer
connected to node rac2.globoforce.com at 198.87.235.246:7777
Jul 7 14:56:56 rac1 kernel: (14410,3):ocfs2_replay_journal:1123 Recovering
node 1 from slot 1 on device (8,65)

rac2:

Jul 7 14:56:24 rac2 kernel: (0,0):o2net_state_change:512 connection to node
rac1.globoforce.com num 0 at 198.87.235.244:7777 has been idle for 10 
seconds,
shutting it down.
Jul 7 14:56:24 rac2 kernel: (10201,0):o2net_set_nn_state:414 no longer
connected to node rac1.globoforce.com at 198.87.235.244:7777
Jul 7 14:56:42 rac2 kernel: (10201,0):o2net_check_quorum:1468 ERROR: fencing
this node because it is connected to a half-quorum of 1 out of 2 nodes which
doesn't include the lowest active node 0
Jul 7 14:56:42 rac2 kernel: (10201,0):o2hb_stop_all_regions:1589 ERROR:
stopping heartbeat on all active regions.
Jul 7 14:56:42 rac2 kernel: Kernel panic: ocfs2 is very sorry to be fencing
this system by panicing


I opened up an SR with Oracle and they recommended that we upgrade to 
SLES 9 SP3 because they don't support the OCFS version that we are 
running. I inquired as to whether this will sort out the problem, but 
they replied with a very vague answer.

Can somebody please shed some light on this : is this version of OCFS 
that we are running very buggy and causes lots of problems like this? 
And if we upgrade is it going to sort out the problem, or are we just 
brining ourselves into "Supported-land" and we can get fixed from there?

Also(sorry for all the questions :), when we upgrade, is it just a case 
of upgrading the kernel and the OCFS rpm's?

Thank you for your help in advance...much appreciated!!
-- 

Mark Maiden
Systems Administrator
Globoforce, Ltd
  6 Beckett Way Parkwest
  Dublin 12
  Ireland
  t: +353 1 625 8812
  f: +353 1 625 8880
  e: markm at globoforce.com
   www.globoforce.com

   http://guidance.gospelcom.net/answer.htm



More information about the Ocfs2-users mailing list