[Ocfs2-users] Default Values of heartbeat dead threshold

Sunil Mushran sunil.mushran at oracle.com
Fri Jun 5 18:36:52 PDT 2009


Actually, it is not complex.

o2cb timeouts: If not using multipathing/netbonding, leave the timeouts
as it. If using multipathing, double the disk hearbeat to 120 secs.
If using netbonding, double the network idle to 60 secs. Ensure your
private network has no loops to prevent spanning tree protocol from
interfering. Leave the reconnect/keepalive timeouts as is.

Next, match the node numbers between the css and o2cb clusters.
This is documented in the 1.4 user's guide. Configure the css cluster
and then edit the o2cb cluster.conf so that the nodes in both stacks
are numbered the same.

Lastly, do not place crs_home on ocfs2. Keep that on local volumes.
The db home can be on ocfs2.

When a node dies, most of the time is spent in node death detection.
The actual recovery is fairly quick. During node death detection, the
fs does not block any ios unless it has to. By that I mean, say the io
requires the node to take a lock that the "supposed" dead node had.
If that happens, that io will be blocked until after the recovery.
But the node will continue to io if it has all the locks. We use this
to our advantage with the voting disk. The css voting disk ios are
non-extending odirect writes. They will not be blocked during detection.
They are only blocked during the actual recovery which is fairly short.
The default css timeouts are much larger than the recovery time.

But, no one is saying you have-to have the voting disk on ocfs2. It could
be on a separate raw device too. If that is the case, then the closest
timeout that I am aware of is the default 15 mins for database controlfile
lock. o2cb timeouts are much shorter than that.

Sunil

Schmitter, Martin wrote:
> Hi Devender,
>
> this is a very complex question.
>
> Timeouts must be set in conjunction with your infrastructure. What type of storage? What OCFS2 Version? Etc. ...
>
> The major problem is, to synchronies the timeouts with CRS timeouts to prevent different decisions. In fact, I am pretty sure, you won’t get a default value ore suggestion.
>
> In general you have to do a lot of tests!
>
> Good praxis for me:
>
> Heartbeat dead threshold =  around 61
> network idle timeout = around 70000
> network keepalive delay in ms 5000
> network reconnect delay in ms 5000
> in a multipath environment with a virtual san.
>
> As I already mentioned, timeouts have to be set in conjunction with your infrastructure and san system. This could be totally different for your needs. Do not take OCFS2 with CRS easy. This is very difficult and make sure you are using the latest releases.
>
> Everything without warranty! Good Luck
>
> Regards,
>
> Martin 
>
> ________________________________________
> Von: ocfs2-users-bounces at oss.oracle.com [ocfs2-users-bounces at oss.oracle.com] im Auftrag von Devender Narula [devendernarula at yahoo.com]
> Gesendet: Freitag, 5. Juni 2009 12:59
> An: ocfs2-users at oss.oracle.com
> Betreff: [Ocfs2-users] Default Values of heartbeat dead threshold
>
> Hi Guys
>
> i got two node RAC cluster Running on RHEL 5.0 .. i just want to what is oracle recomendid Defaults values for below mention parameters
>
> Thanks for your help
>
> Heartbeat dead threshold
> network idle timeout
> network keepalive delay in ms
> network reconnect delay in ms
> kernel.panic_on_oops
> kernel.panic
>
> Regards,
>
> Devender
>   



More information about the Ocfs2-users mailing list