[Ocfs2-users] node gets fenced after mount of shared volume

msl at calivia.com msl at calivia.com
Tue Jul 12 03:35:14 CDT 2005


My cluster fences the second node as if it had lost IP connectivity
between the nodes 10 seconds after mounting a shared volume.

Here's what I do:
1) /etc/init.d/o2cb start on both nodes; modules load fine, Checking
Heartbeat: Not Active (both nodes)

2) mount /u00 on node1; Checking heartbeat: Active (node1)

3) mount /u00 on node2; Checking heartbeat: Active (node2)

After 5 seconds on node1:
kernel: (20248,1):o2net_set_nn_state:437 accepted connection from node
node2 num 1 at 10.1.7.53:7777
Jul 12 09:59:16 node1 kernel: (20248,1):__dlm_print_nodes:380 Nodes in my
domain ("C69655D0DAE44FE2845FBA0E615269DD"):
Jul 12 09:59:16 node1 kernel: (20248,1):__dlm_print_nodes:384  node 0
Jul 12 09:59:16 node1 kernel: (20248,1):__dlm_print_nodes:384  node 1
Jul 12 09:59:33 node1 kernel: (0,1):o2net_idle_timer:1319 connection to
node node2 num 1 at 10.1.7.53:7777 has been idle for 10 seconds, shutting
it down.
Jul 12 09:59:33 node1 kernel: (20248,1):o2net_set_nn_state:420 no longer
connected to node node2 at 10.1.7.53:7777
Jul 12 09:59:54 node1 kernel: (20486,1):ocfs2_replay_journal:1123
Recovering node 1 from slot 1 on device (253,5)

10 seconds later on node2:
node2 kernel: Kernel panic: ocfs2 is very sorry to be fencing this system
by panicing

running tcpdump -i eth1 port 7777 shows traffic as soon as I mount a
shared LV on the second node.

We're running 0.99.16-BETA20 on SLES9 (final SLES9.SP2 download still in
progress...).

With 0.99.15-SLES from SLES9.SP2-RC4 communication between the nodes
seemed to work but a bug [1] prevented further tests.

Is this a known bug in 0.99.16-BETA20?

thanks,
Mike

for reference, this is my /etc/ocfs2/cluster.conf
node:
        ip_port = 7777
        ip_address = 10.1.7.54
        number = 0
        name = node1
        cluster = OCFS2CLUSTER

node:
        ip_port = 7777
        ip_address = 10.1.7.53
        number = 1
        name = node2
        cluster = OCFS2CLUSTER

cluster:
        node_count = 2
        name = OCFS2CLUSTER

[1] http://oss.oracle.com/bugzilla/show_bug.cgi?id=511



More information about the Ocfs2-users mailing list