[Ocfs2-users] re: how should ocfs2 react to nic hardware issue

Mon Dec 4 14:39:37 PST 2006

// Sometimes here can be wrong. It is what I learned from experiments with
Oracle RAC. Take it with a grain of scepticism. //

First of all - TAF _IS NOT TRANSPARENT by itself_.

Oracle client can  provide TRANSPARENT CLUSTER ACCESS only if it is awared
of the TAF (or better if it supports fast failure notification,  their new
jdbc stack for example). Else, it will have a long timeouts on the system
switchover/failover.

Moreover, you selected slow (not permanent conenctions but BASIC) RAC
connection mode, and sqlplus is not TAF application (because human is not
TAF awared client, btw). Not a surprise that you experience delays.

You can read Oracle documentation about RAC tuning (it have information
about failover) and about Oracle*Net (it have information about TAF
configuration), but in general, don't expect much, RAC is not for
Reliability (sometimes it decrease reliability); it is for (in real life)
better scaliability, load sharing, easy maintanance (planned and sometimes
emergency). But as a cluster, it is extremely primitive and can be killed
easily by unusual combination of component failures.

When public interface fail, RAC should not disable instance, and so TAF wil
not report failures for you. In some cases, if you disable interface,
listener can feel it (but I suspect that it was not well tested etc).

So, if you want to provide high reliability with RAC and with TAF, it may be
better to use ethernet donding (on the switch layer)
for the network reliability. RAC cluster is very sensitive to the network
problems and SAN system problems, and they ended up in the full cluster
freeze or reload in many cases. So, unfortunately, if it is necessary for
you (protect against the system failures) use BONDING, or you can use
loopback interfaces for listening on them,  and OSPF routing for network
access to them.

I experimented with a RAC installation for some time; I can not say that it
was 100% clean installation and configuration, but anyway, experiments
shows, that the cases when applications do not see component failover, dont
experience long delays, and even when cluster survive such failovers, are
rare enough. If you want to achieve a high availiability with RAC, then it
require a very careful system, RAC and client configuration, and all clients
must use RAC awared oracle client libraries (usually - connection pool
library).

May be, new SLES10 with new integrated o2cb + heartbeat2 can improve
things - at least you can configure (in heartbeat, not in o2cb) few
heartbeat methods, and configure external ping to verify the cluster network
access (so system can feel when public interface went down). But it will not
help you too much - moreover, it can cause the whole cluster reboot easily
(if node1 decided to self fence ocfsv2, and node2 decided that cssd lost
quorum and must be rebooted because it is not a master). I'd better take
external measures (load balancers, ethernet bonding, etc) if I need realli
high availiability from the RAC cluster.

----- Original Message ----- 
From: "Peter Santos" <psantos at cheetahmail.com>
To: "Adam Kenger" <akenger at gmail.com>
Cc: <ocfs2-users at oss.oracle.com>
Sent: Monday, December 04, 2006 1:42 PM
Subject: Re: [Ocfs2-users] re: how should ocfs2 react to nic hardware issue

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Adam,
>   I basically have a client tnsnames.ora file like this (see below)
>   The entries dbinsto1,dbinsto2 and dbinsto3 are dns entries that point to
the
>   virtual ip managed by Oracle via the racgvip script in CRS_HOME/bin dir.
>
> ORACTAH =
>   (DESCRIPTION =
>     (LOAD_BALANCE = ON)
>     (FAILOVER = ON)
>     (ADDRESS = (PROTOCOL = TCP)(HOST = dbinsto1)(PORT = 1521))
>     (ADDRESS = (PROTOCOL = TCP)(HOST = dbinsto2)(PORT = 1521))
>     (ADDRESS = (PROTOCOL = TCP)(HOST = dbinsto3)(PORT = 1521))
>     (CONNECT_DATA =
>       (SERVER = SHARED)
>       (SERVICE_NAME = ORACTAH)
>           (FAILOVER_MODE =
>             (TYPE = SELECT)
>             (METHOD = BASIC)
>             (RETRIES = 20)
>             (DELAY = 5)
>           )
>     )
>   )
>
> 1. From the client I connect like this: sqlplus user/password at ORACTAH
> 2. I verify which instance I'm connected to like this: select
sys_context('USERENV','INSTANCE_NAME') from dual;
> 3. Then I did an "ifdown eth0" on the node that I connected above (dbo2)
>
> 4. After that I re-tried my query from step 2 and the execution hung for
about 15 minutes.
> 5. After 15 minutes or so, my sqlplus session received the error:
ORA-12152: TNS:unable to send break message.
> 6. After the error message, I got my sqlplus prompt back and re-execute
the query from step 2.
>    This query ran just fine and told me that I was now connected to
instance3 on dbo3.
>
> My ocfs2/o2cb service functioned just fine because it's setup to run on
eth1 (private interconnect). Only when I
> take down eth1 does the o2cb service detect a problem and evicts the node
.. but this is probably the proper
> behavior.
>
> My concern is that when eth0 goes down the following bad things happened:
>
> - connections to that instance "hang" for a long time. Can probably fixed
via some configuration.
> - crs_stat reported that the instance was in an "unknown" state.
> - the instance unix processes remain up on the machine.
> - the listener process remains up on the machine.
> - SRVCTL returns the following when I try to check the status of
everything on that server.
>
> PRKO-2015 : Error in checking condition of instance on node: dbo2
> ASM instance +ASM2 is running on node dbo2.
> VIP is not running on node: dbo2
> GSD is running on node: dbo2
> Listener is not running on node: dbo2
> ONS daemon is running on node: dbo2
>
> - The crsd.log on dbo2 shows this
>
> 2006-12-04 11:01:01.341: [CRSAPP][1436752224]0CheckResource error for
ora.dbo2.vip error code = 1
> 2006-12-04 11:01:01.343: [CRSRES][1436752224]0In stateChanged,
ora.dbo2.vip target is ONLINE
> 2006-12-04 11:01:01.343: [CRSRES][1436752224]0ora.dbo2.vip on dbo2 went
OFFLINE unexpectedly
> 2006-12-04 11:01:01.344: [CRSRES][1436752224]0StopResource: setting CLI
values
> 2006-12-04 11:01:01.381: [CRSRES][1436752224]0Attempting to stop
`ora.dbo2.vip` on member `dbo2`
> 2006-12-04 11:01:01.714: [CRSRES][1436752224]0Stop of `ora.dbo2.vip` on
member `dbo2` succeeded.
> 2006-12-04 11:01:01.715: [CRSRES][1436752224]0ora.dbo2.vip RESTART_COUNT=0
RESTART_ATTEMPTS=0
> 2006-12-04 11:01:01.774: [CRSRES][1436752224]0ora.dbo2.vip failed on dbo2
relocating.
> 2006-12-04 11:01:01.978: [CRSRES][1436752224]0StopResource: setting CLI
values
> 2006-12-04 11:01:02.019: [CRSRES][1436752224]0Attempting to stop
`ora.dbo2.LISTENER_DBO2.lsnr` on member `dbo2`
> 2006-12-04 11:01:02.352: [CRSRES][1436752224]0Stop of
`ora.dbo2.LISTENER_DBO2.lsnr` on member `dbo2` succeeded.
> 2006-12-04 11:01:02.352: [CRSRES][1436752224]0StopResource: setting CLI
values
> 2006-12-04 11:01:02.393: [CRSRES][1436752224]0Attempting to stop
`ora.ORACTAH.ORACTAH2.inst` on member `dbo2`
> 2006-12-04 11:03:09.483: [CRSAPP][1436752224]0StopResource error for
ora.ORACTAH.ORACTAH2.inst error code = 1
> 2006-12-04 11:03:09.544:
[CRSRES][1436752224][ALERT]0`ora.ORACTAH.ORACTAH2.inst` on member `dbo2` has
>          experienced an unrecoverable failure.
> 2006-12-04 11:03:09.544: [CRSRES][1436752224]0Human intervention required
to resume its availability.
>
>
> I've tried this before and when the session query hangs, if I go to the
server and do a shutdown abort
> on the instance, then failover takes place and everything is good ..
otherwise the instance just appears to
> stay up and oracle CRS appears to have no idea on how to deal with it.
>
> - -peter
>
>
> Adam Kenger wrote:
> > Peter - where did your user connect to and what was the status of the
> > service and the status of the instances on each node?  Was it a straight
> > SQL/PLUS connection or was it from an application server?  Once
> > connected, verify where you are connected to by >select * from
> > v$instance; make sure you are disconnecting the node you think you are
> > connected to.  When you connect via sql/plus to a service, you may not
> > always end up on the node that you think you did.  My experience however
> > suggests that through straight sql/plus connected to a service where the
> > TAF policy is set to basic and more than one node is configured as
> > "preferred" you should see no interruption in connection when you kill
> > one of the nodes.
> >
> > srvctl status service -s "oractah"
> > srvctl status instance -i "instancename1,instancename2,instancename3"
> > (fill in your instance names...)
> > srvctl status database -d "databasename" (fill in your database name)
> >
> > Adam
> >
> >
> >
> >
> >
> >
> > On Dec 1, 2006, at 11:07 PM, Peter Santos wrote:
> >
> > Adam,
> > thanks for the feedback.
> > I wanted to quickly show you my setup because I believe that
> > when I took down eth0, my connection hung even though my TAF policy is
> > setup
> > properly.
> >
> >
> > eth0   - public ip of the machine
> > eth0:1 - vip managed by oracle. This is also what clients use to
> > connect to.
> >          dns has entries called dbinsto# that point to the 3 vips on
> > my cluster.
> >
> > eth1   - private ip, used for cluster interconnect and o2cb service
> > (cluster.conf)
> >
> > my client tnsnames.ora
> > =======================================================================
> > ORACTAH =
> >   (DESCRIPTION =
> >     (LOAD_BALANCE = ON)
> >     (FAILOVER = ON)
> >     (ADDRESS = (PROTOCOL = TCP)(HOST = dbinsto1)(PORT = 1521))  <-- vip
> >     (ADDRESS = (PROTOCOL = TCP)(HOST = dbinsto2)(PORT = 1521))  <-- vip
> >     (ADDRESS = (PROTOCOL = TCP)(HOST = dbinsto3)(PORT = 1521))  <-- vip
> >     (CONNECT_DATA =
> >       (SERVER = SHARED)
> >       (SERVICE_NAME = ORACTAH)
> >           (FAILOVER_MODE =
> >             (TYPE = SELECT)
> >             (METHOD = BASIC)
> >             (RETRIES = 20)
> >             (DELAY = 5)
> >           )
> >     )
> >   )
> >
> > Since the o2cb service operates via eth1 and there is no lost
> > connectivity
> > to the shared device, o2cb should continue to work just fine, but
> > I was sure that the vip on this node did not get moved to another node.
> >
> > When I repeated this same process on eth1, the o2cb service evicted
> > this node
> > from the cluster and it was eventually rebooted.. not sure if it was
> > the o2cb service that caused it to reboot or oracle's CRS daemons.
> >
> > I'll keep testing further.
> >
> > thanks
> > -peter
> >
> >
> >
> > Adam Kenger wrote:
> >>>> Peter - depending on how you have your RAC cluster setup, the hang on
> >>>> the front end is not that unexpected.  It depends on how the user was
> >>>> connected and how the TAF policy was set up.  Eth0 is your public
> >>>> interface I assume.  Was the user connecting to the IP on that
interface
> >>>> or to the VIP set up by RAC?  When you down eth0 the VIP on that
> >>>> interface should get pushed over onto one of the other 2 nodes in the
> >>>> cluster.  If you're connecting to a "service" versus an actual
> >>>> "instance" there should be no hang on the front end.  If you're
actually
> >>>> connected directly to the instance on the node, then you'll be out of
> >>>> luck if you disconnect that instance.  As an example, this is what
the
> >>>> corresponding tnsnames.ora file looks like :
> >>>>
> >>>> MYDBSERVICE =
> >>>>   (DESCRIPTION =
> >>>>     (ADDRESS_LIST =
> >>>>       (ADDRESS = (PROTOCOL = TCP)(HOST = node1-vip)(PORT = 1521))
> >>>>       (ADDRESS = (PROTOCOL = TCP)(HOST = node2-vip)(PORT = 1521))
> >>>>       (ADDRESS = (PROTOCOL = TCP)(HOST = node3-vip)(PORT = 1521))
> >>>>     )
> >>>>     (CONNECT_DATA =
> >>>>       (SERVICE_NAME = mydbservice.db.mydomain.com)
> >>>>     )
> >>>>   )
> >>>>
> >>>> MYDB1 =
> >>>>   (DESCRIPTION =
> >>>>     (ADDRESS = (PROTOCOL = TCP)(HOST = node1-vip)(PORT = 15
> >>>> 21))
> >>>>     (CONNECT_DATA =
> >>>>       (SERVER = DEDICATED)
> >>>>       (SERVICE_NAME = mydb.db.mydomain.com)
> >>>>       (INSTANCE_NAME = mydb1)
> >>>>     )
> >>>>   )
> >>>>
> >>>> If you connected to the service "MYDBSERVICE" you could survive the
> >>>> failure of any given node.  You'd seamlessly fail-over onto one of
the
> >>>> other nodes.  If you connect directly to the "MYDB1" instance, you'll
be
> >>>> out of luck if you drop the connection to it.
> >>>>
> >>>> As far as o2cb goes, you are right I believe.  Eventually, it will be
> >>>> determined that the node is no longer heartbeating and will either
panic
> >>>> or reboot.
> >>>>
> >>>> For your testing, just be careful you're not confusing the OCFS2
layer
> >>>> with the Oracle CRS/RAC layers.
> >>>>
> >>>> Comments welcome....
> >>>>
> >>>> Hope that helps
> >>>>
> >>>> Adam
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On Nov 30, 2006, at 3:30 PM, Peter Santos wrote:
> >>>>
> >>>> guys,
> >>>>     I'm trying to test how my 10gR2 oracle cluster (3 nodes) on SuSe
> >>>> reacts to a network card hardware failure.
> >>>>     I have eth0 and eth1 as my network cards, I took down eth0
(ifdown
> >>>> eth0) to see what would happen and
> >>>>     I didn't get any reaction from the o2cb service. This is probably
> >>>> the correct behavior since my
> >>>>      /etc/ocfs2/cluster.conf uses eth1 as the connection channel?
> >>>>
> >>>>     If I take down eth1 I suspect o2cb will eventually reboot the
> >>>> machine right? I'm not using any bonding.
> >>>>
> >>>>     My concern is that when I took down eth0, I had a user logged
into
> >>>> the instance and everything just "hung" for
> >>>>     that user, until I manually took down the instance with
> >>>> "SRVCTL"... then the user connection failed over to
> >>>>     a working instance.
> >>>>
> >>>>     Anyway, just trying to get some general knowledge of the behavior
> >>>> of o2cb in order to understand
> >>>>     my testing.
> >>>>
> >>>> -peter
> >>>>
> >>>>
> >>>>>
> > _______________________________________________
> > Ocfs2-users mailing list
> > Ocfs2-users at oss.oracle.com
> > http://oss.oracle.com/mailman/listinfo/ocfs2-users
> >
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFFdJZBoyy5QBCjoT0RAj0UAKCXLPEwhvyrVYJE3DYc2tSSZ1Z26wCfeVAG
> rhYtyG3dLAV11j+zbMbzhHE=
> =IiVU
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>