hospital Copyright (c) 2003 Oracle Corporation This is a simple cluster heartbeat tool. It's very simple to use. You have these arguments: -t The node timeout. If a node that is up has not pinged in seconds, it is considered to have timed out. -c The number of timeouts (checkpoints) before evicting a node. So, if is 2, then a node that has timed out twice will be changed from the up state to the down state. -i This is how often hospital pings the other members of the cluster. -p The port hospital listens on. It defaults to 1070. -v Be verbose. This can be specified more than once to increase the verbosity level. One '-v' will log when nodes time out. A second '-v' logs every datagram received from the other nodes. A third '-v' logs a ton of useless debugging chatter. -o Where to log the output. The default is /var/log/hospital. -D This forces hospital to run in the foreground. All output will be to stdout/stderr instead of to the logfile. All members of the cluster should use the same set of options. Once options have been specified, you then list the other nodes in the cluster. So, if you have a cluster of 3 nodes: 10.0.0.1, 10.0.0.2, and 10.0.0.3: On 10.0.0.1: hospital -t 5 -i 2 -c 2 -vv 10.0.0.2 10.0.0.3 On 10.0.0.2: hospital -t 5 -i 2 -c 2 -vv 10.0.0.1 10.0.0.3 On 10.0.0.3: hospital -t 5 -i 2 -c 2 -vv 10.0.0.1 10.0.0.2 This configuration will cause hospital to ping the other nodes every 2 seconds (-i 2). If a node has not been heard from in 5 seconds (-t 5), it will have timed out. If a node times out 2 times in a row (-c 2), it will be evicted. The '-vv' will log node timeouts as well as all datagrams received. All output will be logged to /var/log/hospital (the default, because the '-o' option was not specified).