[Ocfs2-users] re: how should ocfs2 react to nic hardware issue
Adam Kenger
akenger at gmail.com
Thu Nov 30 13:04:20 PST 2006
Peter - depending on how you have your RAC cluster setup, the hang on
the front end is not that unexpected. It depends on how the user was
connected and how the TAF policy was set up. Eth0 is your public
interface I assume. Was the user connecting to the IP on that
interface or to the VIP set up by RAC? When you down eth0 the VIP on
that interface should get pushed over onto one of the other 2 nodes
in the cluster. If you're connecting to a "service" versus an actual
"instance" there should be no hang on the front end. If you're
actually connected directly to the instance on the node, then you'll
be out of luck if you disconnect that instance. As an example, this
is what the corresponding tnsnames.ora file looks like :
MYDBSERVICE =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = node1-vip)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = node2-vip)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = node3-vip)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = mydbservice.db.mydomain.com)
)
)
MYDB1 =
(DESCRIPTION =
(ADDRESS = (PROTOCOL = TCP)(HOST = node1-vip)(PORT = 15
21))
(CONNECT_DATA =
(SERVER = DEDICATED)
(SERVICE_NAME = mydb.db.mydomain.com)
(INSTANCE_NAME = mydb1)
)
)
If you connected to the service "MYDBSERVICE" you could survive the
failure of any given node. You'd seamlessly fail-over onto one of
the other nodes. If you connect directly to the "MYDB1" instance,
you'll be out of luck if you drop the connection to it.
As far as o2cb goes, you are right I believe. Eventually, it will be
determined that the node is no longer heartbeating and will either
panic or reboot.
For your testing, just be careful you're not confusing the OCFS2
layer with the Oracle CRS/RAC layers.
Comments welcome....
Hope that helps
Adam
On Nov 30, 2006, at 3:30 PM, Peter Santos wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> guys,
> I'm trying to test how my 10gR2 oracle cluster (3 nodes) on SuSe
> reacts to a network card hardware failure.
> I have eth0 and eth1 as my network cards, I took down eth0 (ifdown
> eth0) to see what would happen and
> I didn't get any reaction from the o2cb service. This is probably
> the correct behavior since my
> /etc/ocfs2/cluster.conf uses eth1 as the connection channel?
>
> If I take down eth1 I suspect o2cb will eventually reboot the
> machine right? I'm not using any bonding.
>
> My concern is that when I took down eth0, I had a user logged into
> the instance and everything just "hung" for
> that user, until I manually took down the instance with
> "SRVCTL"... then the user connection failed over to
> a working instance.
>
> Anyway, just trying to get some general knowledge of the behavior
> of o2cb in order to understand
> my testing.
>
> - -peter
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
>
> iD8DBQFFbz9Soyy5QBCjoT0RAiqvAJ40UCXsV/4Zdv19a246ByzNL4CiwgCfX704
> +BZwa23LphG878FP/5fQKek=
> =Nhcz
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
More information about the Ocfs2-users
mailing list