OCFS2/CLVMAndOCFS2

Integrating OCFS2 and CLVM

Many ocfs2 users are clammoring for some form of Volume Management. Once you start to get into a SAN and larger clusters, having to manually partition disks at static sizes begins to hurt. Volume management, however, needs to be cluster-aware. Otherwise, you could grow, shrink, or modify a logical volume on one node, and other nodes won't know it happened. They'll continue to write to the old volume layout, and corruption will occur.

If ocfs2 is integrated with a clustered volume manager, then volume changes are seen by all nodes, and data is safe. THis is what customers want. The only question is: how to get there?

The Players

There are two major clustered volume managers (CLVMs) today. The first one is "CLVM" from Red Hat. It's originally from Sistina, and it uses the Red Hat cluster software, cman. cman is their userspace cluster membership and communication layer. It backs the GFS2 filesystem as well. CLVM works behind LVM2, so that LVM2 commands become automatically clustered when configured to use the CLVM daemon clvmd.

The second is EVMS. Originally an IBM product, EVMS now is popular on SuSE Enterprise systems. With SuSE's High Availability Storage Framework (HASF), it gains the ability to make volume changes in a cluster. However, it fails to lock out changes. That is, it can propagate the volume changes, but it doesn't prevent the problem we described above, where another node is writing to the old volume layout. EVMS-HA is not safe for OCFS2. This sits on top of Heartbeat2 and SuSE's userspace clustering controller. It's a totally different stack than Red Hats cman.

ocfs2 has its own cluster stack. It doesn't talk to cman, and it doesn't talk to HASF unless you're running SuSE's modified version. This can cause problems when ocfs2 sees a differnet set of nodes than the volume manager.

The Problem

If we're going to make clustered volume management work, the volume manager and the filesystem have to be talking to the same cluster framework. We already have enough trouble with o2cb and CRS fighting over node death. A third stack would just be a nightmare. What we need is to have a cluster stack that both ocfs2 and the clustered volume manager can use in concert. How we get there is anyone's guess.

The Approaches

I was hoping that the full integration of ocfs2 into SuSE's HASF, including EVMS and the cluster info, would make it a happy thing to port towards EL systems. With EVMS not working in a concurrent manner, that's no joy.

The first thing I looked at was putting o2cb/o2dlm underneath clvmd.

Looking at EVMS and SuSE's patches to o2cb, I got an idea. What if we could use their user_heartbeat module to talk to the cman stack? The idea is to use the same stack as CLVM, but not have to modify much of the ocfs2 kernel code. This is the approach I'm trying out right now.