[Ocfs2-users] re: how should ocfs2 react to nic hardware issue

Adam Kenger akenger at gmail.com
Mon Dec 4 07:40:24 PST 2006


Peter - where did your user connect to and what was the status of the  
service and the status of the instances on each node?  Was it a  
straight SQL/PLUS connection or was it from an application server?   
Once connected, verify where you are connected to by >select * from v 
$instance; make sure you are disconnecting the node you think you are  
connected to.  When you connect via sql/plus to a service, you may  
not always end up on the node that you think you did.  My experience  
however suggests that through straight sql/plus connected to a  
service where the TAF policy is set to basic and more than one node  
is configured as "preferred" you should see no interruption in  
connection when you kill one of the nodes.

srvctl status service -s "oractah"
srvctl status instance -i  
"instancename1,instancename2,instancename3" (fill in your instance  
names...)
srvctl status database -d "databasename" (fill in your database name)

Adam






On Dec 1, 2006, at 11:07 PM, Peter Santos wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Adam,
> thanks for the feedback.
> I wanted to quickly show you my setup because I believe that
> when I took down eth0, my connection hung even though my TAF policy  
> is setup
> properly.
>
>
> eth0   - public ip of the machine
> eth0:1 - vip managed by oracle. This is also what clients use to  
> connect to.
>          dns has entries called dbinsto# that point to the 3 vips  
> on my cluster.
>
> eth1   - private ip, used for cluster interconnect and o2cb service  
> (cluster.conf)
>
> my client tnsnames.ora
> ====================================================================== 
> =
> ORACTAH =
>   (DESCRIPTION =
>     (LOAD_BALANCE = ON)
>     (FAILOVER = ON)
>     (ADDRESS = (PROTOCOL = TCP)(HOST = dbinsto1)(PORT = 1521))  <--  
> vip
>     (ADDRESS = (PROTOCOL = TCP)(HOST = dbinsto2)(PORT = 1521))  <--  
> vip
>     (ADDRESS = (PROTOCOL = TCP)(HOST = dbinsto3)(PORT = 1521))  <--  
> vip
>     (CONNECT_DATA =
>       (SERVER = SHARED)
>       (SERVICE_NAME = ORACTAH)
>           (FAILOVER_MODE =
>             (TYPE = SELECT)
>             (METHOD = BASIC)
>             (RETRIES = 20)
>             (DELAY = 5)
>           )
>     )
>   )
>
> Since the o2cb service operates via eth1 and there is no lost  
> connectivity
> to the shared device, o2cb should continue to work just fine, but
> I was sure that the vip on this node did not get moved to another  
> node.
>
> When I repeated this same process on eth1, the o2cb service evicted  
> this node
> from the cluster and it was eventually rebooted.. not sure if it was
> the o2cb service that caused it to reboot or oracle's CRS daemons.
>
> I'll keep testing further.
>
> thanks
> - -peter
>
>
>
> Adam Kenger wrote:
>> Peter - depending on how you have your RAC cluster setup, the hang on
>> the front end is not that unexpected.  It depends on how the user was
>> connected and how the TAF policy was set up.  Eth0 is your public
>> interface I assume.  Was the user connecting to the IP on that  
>> interface
>> or to the VIP set up by RAC?  When you down eth0 the VIP on that
>> interface should get pushed over onto one of the other 2 nodes in the
>> cluster.  If you're connecting to a "service" versus an actual
>> "instance" there should be no hang on the front end.  If you're  
>> actually
>> connected directly to the instance on the node, then you'll be out of
>> luck if you disconnect that instance.  As an example, this is what  
>> the
>> corresponding tnsnames.ora file looks like :
>>
>> MYDBSERVICE =
>>   (DESCRIPTION =
>>     (ADDRESS_LIST =
>>       (ADDRESS = (PROTOCOL = TCP)(HOST = node1-vip)(PORT = 1521))
>>       (ADDRESS = (PROTOCOL = TCP)(HOST = node2-vip)(PORT = 1521))
>>       (ADDRESS = (PROTOCOL = TCP)(HOST = node3-vip)(PORT = 1521))
>>     )
>>     (CONNECT_DATA =
>>       (SERVICE_NAME = mydbservice.db.mydomain.com)
>>     )
>>   )
>>
>> MYDB1 =
>>   (DESCRIPTION =
>>     (ADDRESS = (PROTOCOL = TCP)(HOST = node1-vip)(PORT = 15
>> 21))
>>     (CONNECT_DATA =
>>       (SERVER = DEDICATED)
>>       (SERVICE_NAME = mydb.db.mydomain.com)
>>       (INSTANCE_NAME = mydb1)
>>     )
>>   )
>>
>> If you connected to the service "MYDBSERVICE" you could survive the
>> failure of any given node.  You'd seamlessly fail-over onto one of  
>> the
>> other nodes.  If you connect directly to the "MYDB1" instance,  
>> you'll be
>> out of luck if you drop the connection to it.
>>
>> As far as o2cb goes, you are right I believe.  Eventually, it will be
>> determined that the node is no longer heartbeating and will either  
>> panic
>> or reboot.
>>
>> For your testing, just be careful you're not confusing the OCFS2  
>> layer
>> with the Oracle CRS/RAC layers.
>>
>> Comments welcome....
>>
>> Hope that helps
>>
>> Adam
>>
>>
>>
>>
>> On Nov 30, 2006, at 3:30 PM, Peter Santos wrote:
>>
>> guys,
>>     I'm trying to test how my 10gR2 oracle cluster (3 nodes) on SuSe
>> reacts to a network card hardware failure.
>>     I have eth0 and eth1 as my network cards, I took down eth0  
>> (ifdown
>> eth0) to see what would happen and
>>     I didn't get any reaction from the o2cb service. This is probably
>> the correct behavior since my
>>      /etc/ocfs2/cluster.conf uses eth1 as the connection channel?
>>
>>     If I take down eth1 I suspect o2cb will eventually reboot the
>> machine right? I'm not using any bonding.
>>
>>     My concern is that when I took down eth0, I had a user logged  
>> into
>> the instance and everything just "hung" for
>>     that user, until I manually took down the instance with
>> "SRVCTL"... then the user connection failed over to
>>     a working instance.
>>
>>     Anyway, just trying to get some general knowledge of the behavior
>> of o2cb in order to understand
>>     my testing.
>>
>> -peter
>>
>>
>>>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFFcPwQoyy5QBCjoT0RAsVdAJ9sKZVFv4bUxShy7HUnTtTWieLxJgCdEsXk
> iph57e8yr8ziojISMQf4Tvs=
> =qBE+
> -----END PGP SIGNATURE-----




More information about the Ocfs2-users mailing list