[Ocfs2-users] How to force node [a] to consider node [b] dead?

Karim Alkhayer kkhayer at gmail.com
Mon Jan 26 09:42:26 PST 2009


Hi All,

 

We have  O2CB_HEARTBEAT_THRESHOLD set to 601 as the SAN gets overloaded
sometimes and hence causing the nodes to panic

 

This value has proven to be more stable than 31. However, there are
sometimes where one of the nodes, for instance node [b] crashes, for
whatever reason. While attempting to startup the troublesome node, auto
mount is enabled but doesn't succeed, "Transport endpoint is not connected"
is usually displayed. 

 

My opinion is this: the mount doesn't succeed because node [a] still thinks
that node [b] is alive

 

We're talking about a restart that can take around 15 minutes, so basically,
the threshold is passed

 

I was wondering if there is a workaround to kick node [b] out of the cluster
so that it can join it again. What I've done so far, the incident happened
once - a month ago, is to restart the cluster services on both machines.
This was very expensive solution as all database instances had to go down

 

OCFS2 1.2.1, SLES9 SP3 2.6.5-7.257-default, RAC 10.1.0.5, 5 DBs

 

Thanks

Karim

  

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090126/4e4637d1/attachment.html 


More information about the Ocfs2-users mailing list