[Ocfs2-users] Shutdown to single user mode causes SysRq Reset

Sunil Mushran sunil.mushran at oracle.com
Thu Aug 13 10:52:40 PDT 2009


This is a feature. ;)

If you have mounted a volume on two or more nodes, the expectation
is that the private interconnect will always remain up. If you shutdown
the network on a node, the cluster stack will have to kill a node. It
does so inorder to prevent hangs in cluster operations.

In a 2 node setup, the higher node number will fence. I would imagine
Node A is the higher number. But I am not sure why Node B fenced on
restart. The "eeeeeee" message does not ring a bell.

If you want to get to the bottom of this, setup a netconsole server to
capture the logs.

Or, remember to shut down the cluster before switching to single
user mode.

Sunil

John McNulty wrote:
> Hello,
>
> I've got a 2 node HP DL580 cluster supported by a Fibrechannel SAN
> with dual FC cards, dual switches and an HP EVA on the back end.  All
> SAN disks are multipathed.  Installed software is:
>
> Redhat 5.3
> ocfs2-2.6.18-128.1.14.el5-1.4.2-1.el5
> ocfs2-tools-1.4.2-1.el5
> ocfs2console-1.4.2-1.el5
> Oracle RAC 11g ASM
> Oracle RAC 11g Clusterware
> Oracle RAC 10g databases
>
> OCFS2 isn't being used by RAC, we're using ASM for that, but OCFS2 is
> used to provide a shared /usr/local, /home and /apps.
>
> Yesterday I discovered something very unexpected.   I shutdown node B
> to single user mode, and immediately node A crashed. The only message
> on the console was SysRq Resetting.  Node A then rebooted normally.
> I then exit single user mode on node B to jump back up to run level 3
> the system started up ok, but no sooner had I got the login prompt on
> the console when it too crashed with SysRq Resetting.
>
> I repeated the steps for a second time and it did exactly the same
> thing all over again.  It appears to be repeatable.
>
> The only thing that jumped out at me watching the consoles when this
> was going on was that node B fails to stop the OCFS2 service on
> shutdown, even going to far as to tell me after the fact with an
> "eeeeeee" message.   I assume that's bad !
>
> There were no other console messages to give me a clue, so this is my
> starting point.   Anyone got any ideas?
>
> Oh, there's one other thing that may or may not be relevant.   On this
> cluster, and another identical cluster, mounted.ocfs2 -f always shows
> the node B cluster member as "Unknown" instead of the system name.  As
> far as I'm aware I've followed the OCFS2 setup to the letter (it's not
> complicated) and "o2cb_ctl -It node" on either node shows both systems
> with all the correct details.  Both nodes mount the cluster
> filesystems ok and work just fine.
>
> I've not had chance to try my single user test on the other identical
> cluster yet as I've not been able to get a downtime window for it.  If
> I do, then I will.
>
> Rgds,
>
> John
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users




More information about the Ocfs2-users mailing list