CMan and the O2CB Control Daemon
The first step in using the cman stack is to configure the cluster via cman and feed the node configuration to o2cb. This is accomplished through o2cb_controld, following the "control daemon" paradigm they use with fs/dlm and GFS2.
A Starting Point
o2cb_controld was created by copying dlm_controld from the cman cluster stack and modifying it as needed. Why dlm_controld? It's a pretty simple daemon that does just two things.
Tell fs/dlm the configuration of all the nodes in the cluster.
Handle joining and leaving lockspaces for fs/dlm.
You've no doubt guessed that the first item is exactly what we want o2cb_controld to do: tell o2cb the configuration of the nodes.
A libcman Primer
One of the things I want to do is write down what I've learned while reading all this code. Not so much what you can learn about configuring the cluster (see http://sources.redhat.com/cluster), but how the tools use the cluster libraries. They aren't really documented.
Let's start with the basics of all the cman stack libraries (libcman, libgroup, libcpg, etc). They give you an initialization function. You call it and get a handle. You then ask for a file descriptor, which you can use in your poll(2) loop. When the fd has I/O, you call the dispatch function for the library. This function reads the message from the cluster and calls one of the callbacks you have provided. Sounds complicated, but it's basically boilerplate. I said I was copying from dlm_controld, didn't I? In dlm_controld and o2cb_controld, this is the member_cman.c file.
So, what interface does libcman provide to the cluster? A simple one, thankfully. There is only one callback for libcman. Once the cman library is initialized and notifications are started, the daemon really only has to handle two events:
- STATECHANGE - The cluster has changed somehow
- TRY_SHUTDOWN - The cluster would like to exit
When STATECHANGE happens, the daemon asks libcman for the cluster configuration and acts according to any changes. When TRY_SHUTDOWN happens, the daemon is answering the question "is it safe to shut the cluster down now?". The cman_replyto_shutdown() function allows the daemon to answer yes or no.
The dlm_controld program handles state changes by keeping around a current view of the cluster topology. When STATECHANGE happens, the full new topology is queried. It is compared against the old topology, and any changes are passed through to the action code.
The O2CB Actions
dlm_controld has a file, action.c, that handles telling fs/dlm about the cluster topology. When a node is added, the action code goes to the configfs hierarchy for fs/dlm and adds the node. The reverse happens when a node leaves the cluster.
We're not using fs/dlm. We're configuring o2cb. So, I completely revamped action.c to call libo2cb functions. When a node comes up, we pass its information to o2cb_add_node(). When it goes down, we call o2cb_remove_node().
There's only one real caveat in action.c right now - cman passes a node's address with port 0, but o2cb needs a port number. So, we default to 7777 as always. We probably want a way to configure this at startup.
The Rest of the Daemon
dlm_controld also manages lockspaces. These are managed through the group daemon. Thus, dlm_controld has a file group.c to talk to groupd. o2cb_controld has no need for groups, so I've removed this file.
dlm_controld listens for kobject uevents from fs/dlm. This is the other half of managing lockspaces. This code was in main.c, and I've removed it as well.
main.c contains the startup code and the main loop. I removed the lockspace code, the uevent code, and the groupd startup code. It now just starts up libcman.
Already o2cb_controld is able to manage o2cb from a running cman cluster. cman changes are propagated to the o2cb configfs, and ocfs2 can be mounted on top of the configuration. On to the next step!
One caveat. It would appear that o2cb_controld gets notified of a node death before ocfs2_controld. This can't happen, as ocfs2_controld will have the node pinned and o2cb_controld won't be able to delete it. Gotta check on that!
Thinking about it, I think what we'll have to do is take a semaphore with ocfs2_controld and just queue up changes while the semaphore is held.