[Ocfs2-users] Shutdown to single user mode causes SysRq Reset

John McNulty johnmcn1 at googlemail.com
Thu Aug 13 06:44:32 PDT 2009


Hello,

I've got a 2 node HP DL580 cluster supported by a Fibrechannel SAN
with dual FC cards, dual switches and an HP EVA on the back end.  All
SAN disks are multipathed.  Installed software is:

Redhat 5.3
ocfs2-2.6.18-128.1.14.el5-1.4.2-1.el5
ocfs2-tools-1.4.2-1.el5
ocfs2console-1.4.2-1.el5
Oracle RAC 11g ASM
Oracle RAC 11g Clusterware
Oracle RAC 10g databases

OCFS2 isn't being used by RAC, we're using ASM for that, but OCFS2 is
used to provide a shared /usr/local, /home and /apps.

Yesterday I discovered something very unexpected.   I shutdown node B
to single user mode, and immediately node A crashed. The only message
on the console was SysRq Resetting.  Node A then rebooted normally.
I then exit single user mode on node B to jump back up to run level 3
the system started up ok, but no sooner had I got the login prompt on
the console when it too crashed with SysRq Resetting.

I repeated the steps for a second time and it did exactly the same
thing all over again.  It appears to be repeatable.

The only thing that jumped out at me watching the consoles when this
was going on was that node B fails to stop the OCFS2 service on
shutdown, even going to far as to tell me after the fact with an
"eeeeeee" message.   I assume that's bad !

There were no other console messages to give me a clue, so this is my
starting point.   Anyone got any ideas?

Oh, there's one other thing that may or may not be relevant.   On this
cluster, and another identical cluster, mounted.ocfs2 -f always shows
the node B cluster member as "Unknown" instead of the system name.  As
far as I'm aware I've followed the OCFS2 setup to the letter (it's not
complicated) and "o2cb_ctl -It node" on either node shows both systems
with all the correct details.  Both nodes mount the cluster
filesystems ok and work just fine.

I've not had chance to try my single user test on the other identical
cluster yet as I've not been able to get a downtime window for it.  If
I do, then I will.

Rgds,

John



More information about the Ocfs2-users mailing list