[Ocfs2-users] ocfs2 keeps fencing all my nodes

Srinivas Eeda srinivas.eeda at oracle.com
Thu Jan 18 13:32:47 PST 2007


John,

it's hard to tell without seeing the messages on the surviving node. Do 
you remember how many node slots you have created when formating the 
volume? Maybe you configured just 1?, if so, use tunefs.ocfs2 to 
increase the number of slots

If that's not the problem, please copy paste the corresponding messages 
on the surviving node.

thanks,
--Srini.

John Lange wrote:
> I have a 4 node SLES 10 cluster with all nodes attached to a SAN via
> fiber.
>
> The SAN has a EVMS volume formatted with ocfs2. Below is my ocfs2.conf.
>
> I can mount the volume on any single node but as soon as I mount it on
> the second node, it fences one of the nodes. There is never more than
> one node active at a time.
>
> When I check the status of the nodes (quickly before they get fenced)
> the satus shows they are heartbeating.
>
> # /etc/init.d/o2cb status
> Module "configfs": Loaded
> Filesystem "configfs": Mounted
> Module "ocfs2_nodemanager": Loaded
> Module "ocfs2_dlm": Loaded
> Module "ocfs2_dlmfs": Loaded
> Filesystem "ocfs2_dlmfs": Mounted
> Checking O2CB cluster ocfs2: Online
> Checking O2CB heartbeat: Active
>
> ======== 
>
> Here are the logs from 2 machines (NOTE that this is the logs from 2
> machines at the same time as they were captured via remote syslog on a
> 3rd machine machine) of what happens when the node vs2 is already
> running, and node vs3 joins the cluster (mounts the ocfs2 file system).
> In this instance vs3 gets fenced.
>
> Jan 18 14:52:41 vs2 kernel: o2net: accepted connection from node vs3 (num 2) at 10.1.1.13:7777
> Jan 18 14:52:41 vs3 kernel: o2net: connected to node vs2 (num 1) at 10.1.1.12:7777
> Jan 18 14:52:45 vs3 kernel: OCFS2 1.2.3-SLES Thu Aug 17 11:38:33 PDT 2006 (build sles)
> Jan 18 14:52:45 vs2 kernel: ocfs2_dlm: Node 2 joins domain 89FC5CB6C98B43B998AB8492874EA6CA
> Jan 18 14:52:45 vs2 kernel: ocfs2_dlm: Nodes in domain ("89FC5CB6C98B43B998AB8492874EA6CA"): 1 2 
> Jan 18 14:52:45 vs3 kernel: ocfs2_dlm: Nodes in domain ("89FC5CB6C98B43B998AB8492874EA6CA"): 1 2 
> Jan 18 14:52:45 vs3 kernel: kjournald starting.  Commit interval 5 seconds
> Jan 18 14:52:45 vs3 kernel: ocfs2: Mounting device (253,13) on (node 2, slot 0)
> Jan 18 14:52:45 vs3 udevd-event[5542]: run_program: ressize 256 too short
> Jan 18 14:52:51 vs2 kernel: o2net: connection to node vs3 (num 2) at 10.1.1.13:7777 has been idle for 10 seconds, shutting it down.
> Jan 18 14:52:51 vs2 kernel: (0,0):o2net_idle_timer:1314 here are some times that might help debug the situation: (tmr 1169153561.99906 now 1169153571.93951 dr 1169153566.98
> 030 adv 1169153566.98039:1169153566.98040 func (09ab0f3c:504) 1169153565.211482:1169153565.211485)
> Jan 18 14:52:51 vs3 kernel: o2net: no longer connected to node vs2 (num 1) at 10.1.1.12:7777
> Jan 18 14:52:51 vs2 kernel: o2net: no longer connected to node vs3 (num 2) at 10.1.1.13:7777
>
> ==========
>
> I previously had configured ocfs2 for userspace heartbeating but
> couldn't get that running so I reconfigured for disk based. Could that
> now be the cause of this problem?
>
> Where do the nodes write the heartbeats? I see nothing on the ocfs2
> system.
>
> Also, I have no /config directory that is mentioned in the docs. Is that
> normal?
>
> Here is /etc/ocfs2/cluster.conf
>
> node:
>         ip_port = 7777
>         ip_address = 10.1.1.11
>         number = 0
>         name = vs1
>         cluster = ocfs2
>
> node:
>         ip_port = 7777
>         ip_address = 10.1.1.12
>         number = 1
>         name = vs2
>         cluster = ocfs2
>
> node:
>         ip_port = 7777
>         ip_address = 10.1.1.13
>         number = 2
>         name = vs3
>         cluster = ocfs2
>
> node:
>         ip_port = 7777
>         ip_address = 10.1.1.14
>         number = 3
>         name = vs4
>         cluster = ocfs2
>
> cluster:
>         node_count = 4
>         name = ocfs2
>
>
> Regards,
>
> Any tips on how I can go about diagnosing this problem?
>
> Thanks,
> John Lange
>
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>   



More information about the Ocfs2-users mailing list