[Ocfs2-users] OCFS2 Fencing, then panic

enohi ibekwe enohiaghe at hotmail.com
Tue Apr 10 06:28:24 PDT 2007


More details:

I am attempting to add a node (node 2) to an existing 2 node ( node 0 and 
node1) cluster. Alll nodes are curently running SLES9 (2.6.5-7.283-bigsmp 
i686) + ocfs 1.2.1-4.2. This is the ocfs package that ships with SLES9. Node 
2 is not part of the RAC cluster yet, I have only installed ocfs on it. I 
can mount the ocfs file system on all nodes, and the ocfs file system is 
accessible from all nodes.

Node 0 is the node alway fenced and gets fenced very frequently. Before I 
added the kernel.panic parameter, node 0 would get fenced, panic and hang. 
Only a power reboot would make it responsive again.

My issue is the frequency at which node 0 gets fenced, it has happened at 
least once a day in the last 2 days.

This is what happened this morning.

I was remotely connected to node 0 via ssh. Then I suddenly lost the 
connection. I tried to ssh again but node 0 refused the connection.

Checking node 1 dmesg I found :
ocfs2_dlm: Nodes in domain ("A7AE746FB3D34479A4B04C0535A0A341"): 0 1 2
o2net: connection to node ora1 (num 0) at 10.12.1.34:7777 has been idle for 
10 seconds, shutting it down.
(0,3):o2net_idle_timer:1310 here are some times that might help debug the 
situation: (tmr 1176207822.713473 now 1176207832.712008 dr 1176207822.713466 
adv 1176207822.713475:1176207822.713476 func (1459c2a9:504) 
1176196519.600486:1176196519.600489)
o2net: no longer connected to node ora1 (num 0) at 10.12.1.34:7777

checking node 2 dmesg I found:
ocfs2_dlm: Nodes in domain ("A7AE746FB3D34479A4B04C0535A0A341"): 0 1 2
o2net: connection to node ora1 (num 0) at 10.12.1.34:7777 has been idle for 
10 seconds, shutting it down.
(0,0):o2net_idle_timer:1310 here are some times that might help debug the 
situation: (tmr 1176207823.774296 now 1176207833.772712 dr 1176207823.774293 
adv 1176207823.774297:1176207823.774297 func (1459c2a9:504) 
1176196505.704238:1176196505.704240)
o2net: no longer connected to node ora1 (num 0) at 10.12.1.34:7777

Since I had reboot on panic on both node 0, node 0 restarted. Checking 
/var/log/messages I found:
Apr 10 09:39:50 ora1 kernel: (12,2):o2quo_make_decision:121 ERROR: fencing 
this node because it is only connected to 1 nodes and 2 is needed to make a 
quorum out of 3 heartbeating nodes
Apr 10 09:39:50 ora1 kernel: (12,2):o2hb_stop_all_regions:1909 ERROR: 
stopping heartbeat on all active regions.
Apr 10 09:39:50 ora1 kernel: Kernel panic: ocfs2 is very sorry to be fencing 
this system by panicing
A



----Original Message Follows----
From: Sunil Mushran <Sunil.Mushran at oracle.com>
To: enohi ibekwe <enohiaghe at hotmail.com>
CC: ocfs2-users at oss.oracle.com
Subject: Re: [Ocfs2-users] OCFS2 Fencing, then panic
Date: Fri, 06 Apr 2007 09:31:17 -0700

You will have to provide more information. If you
have a netconsole server configured, it would have the details.
Else, I would recommend you configure one to catch the
messages during fence. We have to see the deduce for the fence
to determine the actual problem.

enohi ibekwe wrote:
>Is this also an issue on SLES9?
>
>I see this exact issue on my SLES9 + ocfs 1.2.1-4.2 RAC cluster. I see the 
>error on the same box on the cluster.
>
>_________________________________________________________________
>Need a break? Find your escape route with Live Search Maps. 
>http://maps.live.com/?icid=hmtag3
>
>
>_______________________________________________
>Ocfs2-users mailing list
>Ocfs2-users at oss.oracle.com
>http://oss.oracle.com/mailman/listinfo/ocfs2-users

_________________________________________________________________
Mortgage rates near historic lows. Refinance $200,000 loan for as low as 
$771/month* 
https://www2.nextag.com/goto.jsp?product=100000035&url=%2fst.jsp&tm=y&search=mortgage_text_links_88_h27f8&disc=y&vers=689&s=4056&p=5117




More information about the Ocfs2-users mailing list