[Ocfs-users] cluster with 2 nodes - heartbeat problem fencing

Thu Mar 6 10:13:43 PST 2008

What that note says is that in a 2 node setup, if the communication link
between the two node breaks, the higher node number will be fenced.

In your case, you are shutting down the network on node 0. The cluster stack
sees this as a comm link down between the two nodes. At this stage, even
if you do a umount vol on node 1, node 1 will still have node 0 in its 
domain
and will want to ping it to migrate lockres', leave the domain, etc. As in,
umount is a clusterwide event and not an isolated one.

Forcibly shutting down hb won't work because the vol is still mounted
and all those inodes are still cached and maybe still in use.

I am unclear as to what your real problem is.

Sunil

g.digiambelardini at fabaris.it wrote:
> Hi thanks for your help.
> We read your link, and we tried many solutions, but nothings work well for
> us.
> The situation is that when we stop the eth link con the server have node
> number = 0 ( virtual1 ) and shared partition is mounted, we can't for some
> second umount manually the partition ( or shutdown the server ) before the
> node 2 go in  kernel panic ( the partition seems locked ).
>
> this is our /etc/default/o2cb:
>
> # O2CB_ENABLED: 'true' means to load the driver on boot.
> O2CB_ENABLED=true
>
> # O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start.
> O2CB_BOOTCLUSTER=ocfs2
>
> # O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead.
> O2CB_HEARTBEAT_THRESHOLD=30
>
> # O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is
> considered dead.
> O2CB_IDLE_TIMEOUT_MS=50000
>
> # O2CB_KEEPALIVE_DELAY_MS: Max. time in ms before a keepalive packet is
> sent.
> O2CB_KEEPALIVE_DELAY_MS=5000
>
> # O2CB_RECONNECT_DELAY_MS: Min. time in ms between connection attempts.
> O2CB_RECONNECT_DELAY_MS=5000
> -----------------------------------------------------------------------
> We tried to change many times the value but nothing to do.
>
> I think the most easy way is stop heartbeat, but we can success to do it.
>
> HELP ME
>
>
>
>
>
>
>
>
> -----Sunil Mushran <Sunil.Mushran at oracle.com> wrote: -----
>
> To: g.digiambelardini at fabaris.it
> From: Sunil Mushran <Sunil.Mushran at oracle.com>
> Date: 05/03/2008 18.55
> cc: ocfs-users at oss.oracle.com
> Subject: Re: [Ocfs-users] cluster with 2 nodes - heartbeat problem fencing
>
> http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.html#QUORUM
>
>
> g.digiambelardini at fabaris.it wrote:
>   
>> Hi,
>> now the problem is different,
>> this is My cluster.conf:
>>
>> ----------------------------------------------------------
>> node:
>>         ip_port = 7777
>>         ip_address = 1.1.1.1
>>         number = 0
>>         name = virtual1
>>         cluster = ocfs2
>>
>> node:
>>         ip_port = 7777
>>         ip_address = 1.1.1.2
>>         number = 1
>>         name = virtual2
>>         cluster = ocfs2
>>
>> cluster:
>>         node_count = 2
>>         name = ocfs2
>> -----------------------------------------------------
>> now seems the one of the cluster is a master, or better the virtual1 is a
>> master, so when we shutdown the heartbeat interface ( eth0 - with
>>     
> partition
>   
>> mounted ) on the virtual1, the virtual2 gone in kernel panic. Instead if
>>     
> we
>   
>> shutdown the eth0 on virtual2, virtual1 work well.
>> some body can help us?
>> obviously if we reboot any server, so the partition gone unmounted before
>> network gone down, avery thing work well.
>> THANKS
>>
>>
>>
>>
>> -----ocfs-users-bounces at oss.oracle.com wrote: -----
>>
>> To: ocfs-users at oss.oracle.com
>> From: g.digiambelardini at fabaris.it
>> Sent by: ocfs-users-bounces at oss.oracle.com
>> Date: 05/03/2008 13.51
>> Subject: [Ocfs-users] cluster with 2 nodes - heartbeat problem fencing
>>
>>
>>
>> Hi to all, this is My first time on this mailinglist.
>> I have a problem with Ocfs2 on Debian etch 4.0
>> I'd like when a node go down or freeze without unmount the ocfs2
>>     
> partition
>   
>> the heartbeat  not fence the server that work well ( kernel panic ).
>> I'd like disable or heartbeat or fencing. So we can work also with only 1
>> node.
>> Thanks
>>
>>
>> _______________________________________________
>> Ocfs-users mailing list
>> Ocfs-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs-users
>>
>>
>> _______________________________________________
>> Ocfs-users mailing list
>> Ocfs-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs-users
>>
>>     
>
>
>