[OracleOSS] [TitleIndex] [WordIndex]

OCFS2/DesignDocs/GlobalHeartbeat

GLOBAL HEARTBEAT ** DEPRECATED **

Sunil Mushran, May 2007

GOALS

Currently the disk heartbeat is on a per-device basis and the disk heartbeat thread is launched on each mount.

The upside of the this scheme is ease-of-use that comes with fewer things to configure.

The downside is scalability. Scalability refers to the ability to mount 50-100 or more ocfs2 volumes. As each mount has its own hb thread, it limits the number of mounts one can perform before flooding the system/disks with just hb io. Also, it takes 5 secs or so to mount a volume, most of it is spent waiting for the hb thread to stabilize.

In the global heartbeat scheme, the user will specify ONE device for hb and all other mounts will attach to it to get node up/down events.

This is not to say that local heartbeat will/should be removed. After all, ease-of-use is ease-of-use. The goal of this exercise is to provide the flexibility to the user to choose between global/local hbs during configuration. The user will also be able to toggle between the two schemes at any point (as long as the cluster is offline during the said toggling).

However, mixing of local and global will not be allowed. As in, no support for allowing some volumes on a node configured for local hb and some global.

USER INTERACTION

The user will have the ability to specify the global heartbeat device during cluster setup.

# cat /etc/ocfs2/cluster.conf
...
node:
        ip_port = 7777
        ip_address = 192.168.0.100
        number = 100
        name = nodes100
        cluster = mycluster

cluster:
        node_count = 100
        heartbeat_region = 80D1ADF43FDE40DD9B56D370462ACE17
        name = mycluster

During cluster startup, o2cb_ctl will populate heartbeat_region indicating global heartbeat.

# cat /sys/kernel/config/cluster/mycluster/heartbeat/heartbeat_region
80D1ADF43FDE40DD9B56D370462ACE17

As part of the cluster online, o2cb.init will call o2hb_ctl that will startup the global heartbeat. o2hb_ctl is a new utility that will be called by o2cb.init, mount.ocfs2 and fs to start and stop hb. The utility will detect the mode it is in and then call ocfs2_hb_ctl if needed.

# service o2cb start
Loading module "configfs": OK
Mounting configfs filesystem at /config: OK
Loading module "ocfs2_nodemanager": OK
Loading module "ocfs2_dlm": OK
Loading module "ocfs2_dlmfs": OK
Mounting ocfs2_dlmfs filesystem at /dlm: OK
Registering O2CB cluster mycluster: OK
Starting O2CB Global heartbeat: OK
#
# ps aux | grep o2hb
root     28326  0.0  0.0     0    0 ?        S<   13:32   0:04 [o2hb-80D1ADF43F]

mount will be normal and the mounted devices will indicate that the heartbeat is global.

# mount -t ocfs2 /dev/sdb1 /u0
# mount | grep ocfs2
ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw)
/dev/sdb1 on /u0 type ocfs2 (rw,_netdev,heartbeat=global)

STRUCTURES

o2net_check_handshake() will send the heartbeat region so as to ensure all nodes in the cluster are using the same hb device.

struct o2net_handshake {
        __be64  protocol_version;
        __be64  connector_id;
        __be32  o2hb_heartbeat_timeout_ms;
        __be32  o2net_idle_timeout_ms;
        __be32  o2net_keepalive_delay_ms;
        __be32  o2net_reconnect_delay_ms;
        __u8    o2hb_region[16];         /* NEW */
};

TODOS

Tools

Module

STATUS

Most of the development work is completed. As in, one can use the patches to mount volumes using global heartbeat. (Will need to hand start the heartbeat as o2cb.init has not been updated.)

Tools Patches

series-tools

hbctl-add_global.patch

mount-allglobal.patch

o2cb_ctl-hbregion.patch

ocfs2-heartbeat.patch

Kernel Patch

o2cb-global_hb.patch

While this patch does not apply cleanly on 2.6.23-rc8, I am attaching it as is, as the conflicts are minor.

05/05/2007 - smushran - First draft

09/25/2007 - smushran - Patches attached


2011-12-23 01:01