[Ocfs2-users] o2quo_make_decision

Karim Alkhayer kkhayer at gmail.com
Tue Feb 3 11:17:27 PST 2009


>Hello Sunil,

>Any thoughts to avoid this behavior? It was impossible to resume the OCFS2
service on the other node as it seemed that the access to the shared storage
cannot be managed consistently
>on the first node, the system didn't hang, but the cluster databases did
>Another observation on the first node which hanged first (where did the 10
seconds come from? See config below)

>Feb  3 12:27:06 oracle2d kernel: o2net: connection to node oracle1d (num 1)
at 172.20.1.1:7777 has been idle for 10 seconds, shutting it down.
>Feb  3 12:27:06 oracle2d kernel: o2net: no longer connected to node
oracle1d (num 1) at 172.20.1.1:7777                                       
>Feb  3 14:02:39 oracle2d ntpd[16839]: Listening on interface eth2,
172.20.1.2#123                                                             
>Feb  3 14:03:01 oracle2d kernel: o2net: connected to node oracle1d (num 1)
at 172.20.1.1:7777                                                 
>oracle2d:~ # cat /etc/sysconfig/o2cb

>#

># This is a configuration file for automatic startup of the O2CB

># driver.  It is generated by running /etc/init.d/o2cb configure.

># Please use that method to modify this file

>#

>

># O2CB_ENABELED: 'true' means to load the driver on boot.

>O2CB_ENABLED=true

>

># O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start.

>O2CB_BOOTCLUSTER=racdb1

>

># O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead.

>O2CB_HEARTBEAT_THRESHOLD=601

>

># O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is
considered dead.                                                            
>O2CB_IDLE_TIMEOUT_MS=30000

>

># O2CB_KEEPALIVE_DELAY_MS: Max time in ms before a keepalive packet is sent

>O2CB_KEEPALIVE_DELAY_MS=2000

>

># O2CB_RECONNECT_DELAY_MS: Min time in ms between connection attempts

>O2CB_RECONNECT_DELAY_MS=2000       

>There was no physical link outage? What else do you recommend to verify?


> Thanks for your time

> Best regards,
> Karim 
-----Original Message-----
From: Sunil Mushran [mailto:sunil.mushran at oracle.com] 
Sent: Tuesday, February 03, 2009 8:36 PM
To: Karim Alkhayer
Cc: ocfs2-users at oss.oracle.com
Subject: Re: [Ocfs2-users] o2quo_make_decision

Means the network connection between two nodes, in a two node cluster, 
broke.
In such a case, we fence off one of the nodes.

The FAQ and 1.4 user's guide talk about quorum.

Karim Alkhayer wrote:
>
> Sunil,
>
> > Any clue what this means?
>
> > Feb  3 12:47:12 oracle2d kernel: (19,1):o2quo_make_decision:144 
> ERROR: fencing this node because it is connected to a half-quorum of 1 
> out of 2 nodes which doesn't include the lowest active node 1
>
> > Feb  3 12:47:12 oracle2d kernel: (19,1):o2hb_stop_all_regions:1889 
> ERROR: stopping heartbeat on all active regions.
>
> > Feb  3 12:47:12 oracle2d kernel: Kernel panic: ocfs2 is very sorry 
> to be fencing this system by panicing
>
>  
>
> Thanks,
>
> Karim
>
>  
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users




More information about the Ocfs2-users mailing list