Bugs to Squash
Random bugs that need squashing. This can be things that broke in cman or in the ocfs2 software.
Failed mount doesn't clean up properly
Four node cluster. I had two nodes mounted, and I went to mount a third. The network parameters on the third node were different, so ocfs2 failed the mount. This is just fine. mount(2) returned to mount.ocfs2(8). But ocfs2_controld never left the group like it should, hanging the group.
So far I can't reproduce this again.
In the previous bug, I eventually reset the node with the hung mount. I expected the "good" nodes to try and fence the "failed" node. Instead, the aisexec(8) process on both nodes segfaulted.
It would appear that, on fc7/amd64, aisexec segfaults when a node dies. rhel5/ppc doesn't have this problem.
Does fencing work?
With four nodes in a cluster, I had three mounted. One of hte mounted nodes died, I think due to the aisexec(8) crash. The other nodes never tried to fence it, and never let ocfs2 continue, even after I brought the node back up.