[Ocfs2-users] ocfs2 keeps fencing all my nodes

Alexei_Roudnev Alexei_Roudnev at exigengroup.com
Thu Jan 18 15:38:44 PST 2007


As I remember from the LinuxWorld, such configuration requires usingh
heartbeat2 in addition to o2cb, and configuring OCFSv2 to make heartbeat
thru it (not directly).

It's what SuSe tested:

- 4 nodes
- heartbeat2
- evms
- OCFSv2 interacting with heartbeat2




----- Original Message ----- 
From: "John Lange" <j.lange at epic.ca>
To: "ocfs2-users" <ocfs2-users at oss.oracle.com>
Sent: Thursday, January 18, 2007 1:03 PM
Subject: [Ocfs2-users] ocfs2 keeps fencing all my nodes


> I have a 4 node SLES 10 cluster with all nodes attached to a SAN via
> fiber.
>
> The SAN has a EVMS volume formatted with ocfs2. Below is my ocfs2.conf.
>
> I can mount the volume on any single node but as soon as I mount it on
> the second node, it fences one of the nodes. There is never more than
> one node active at a time.
>
> When I check the status of the nodes (quickly before they get fenced)
> the satus shows they are heartbeating.
>
> # /etc/init.d/o2cb status
> Module "configfs": Loaded
> Filesystem "configfs": Mounted
> Module "ocfs2_nodemanager": Loaded
> Module "ocfs2_dlm": Loaded
> Module "ocfs2_dlmfs": Loaded
> Filesystem "ocfs2_dlmfs": Mounted
> Checking O2CB cluster ocfs2: Online
> Checking O2CB heartbeat: Active
>
> ========
>
> Here are the logs from 2 machines (NOTE that this is the logs from 2
> machines at the same time as they were captured via remote syslog on a
> 3rd machine machine) of what happens when the node vs2 is already
> running, and node vs3 joins the cluster (mounts the ocfs2 file system).
> In this instance vs3 gets fenced.
>
> Jan 18 14:52:41 vs2 kernel: o2net: accepted connection from node vs3 (num
2) at 10.1.1.13:7777
> Jan 18 14:52:41 vs3 kernel: o2net: connected to node vs2 (num 1) at
10.1.1.12:7777
> Jan 18 14:52:45 vs3 kernel: OCFS2 1.2.3-SLES Thu Aug 17 11:38:33 PDT 2006
(build sles)
> Jan 18 14:52:45 vs2 kernel: ocfs2_dlm: Node 2 joins domain
89FC5CB6C98B43B998AB8492874EA6CA
> Jan 18 14:52:45 vs2 kernel: ocfs2_dlm: Nodes in domain
("89FC5CB6C98B43B998AB8492874EA6CA"): 1 2
> Jan 18 14:52:45 vs3 kernel: ocfs2_dlm: Nodes in domain
("89FC5CB6C98B43B998AB8492874EA6CA"): 1 2
> Jan 18 14:52:45 vs3 kernel: kjournald starting.  Commit interval 5 seconds
> Jan 18 14:52:45 vs3 kernel: ocfs2: Mounting device (253,13) on (node 2,
slot 0)
> Jan 18 14:52:45 vs3 udevd-event[5542]: run_program: ressize 256 too short
> Jan 18 14:52:51 vs2 kernel: o2net: connection to node vs3 (num 2) at
10.1.1.13:7777 has been idle for 10 seconds, shutting it down.
> Jan 18 14:52:51 vs2 kernel: (0,0):o2net_idle_timer:1314 here are some
times that might help debug the situation: (tmr 1169153561.99906 now
1169153571.93951 dr 1169153566.98
> 030 adv 1169153566.98039:1169153566.98040 func (09ab0f3c:504)
1169153565.211482:1169153565.211485)
> Jan 18 14:52:51 vs3 kernel: o2net: no longer connected to node vs2 (num 1)
at 10.1.1.12:7777
> Jan 18 14:52:51 vs2 kernel: o2net: no longer connected to node vs3 (num 2)
at 10.1.1.13:7777
>
> ==========
>
> I previously had configured ocfs2 for userspace heartbeating but
> couldn't get that running so I reconfigured for disk based. Could that
> now be the cause of this problem?
>
> Where do the nodes write the heartbeats? I see nothing on the ocfs2
> system.
>
> Also, I have no /config directory that is mentioned in the docs. Is that
> normal?
>
> Here is /etc/ocfs2/cluster.conf
>
> node:
>         ip_port = 7777
>         ip_address = 10.1.1.11
>         number = 0
>         name = vs1
>         cluster = ocfs2
>
> node:
>         ip_port = 7777
>         ip_address = 10.1.1.12
>         number = 1
>         name = vs2
>         cluster = ocfs2
>
> node:
>         ip_port = 7777
>         ip_address = 10.1.1.13
>         number = 2
>         name = vs3
>         cluster = ocfs2
>
> node:
>         ip_port = 7777
>         ip_address = 10.1.1.14
>         number = 3
>         name = vs4
>         cluster = ocfs2
>
> cluster:
>         node_count = 4
>         name = ocfs2
>
>
> Regards,
>
> Any tips on how I can go about diagnosing this problem?
>
> Thanks,
> John Lange
>
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>




More information about the Ocfs2-users mailing list