[Ocfs2-users] Shutdown to single user mode causes SysRq Reset

John McNulty johnmcn1 at googlemail.com
Thu Aug 13 16:15:58 PDT 2009


Well that's a relief, but are you sure it's the same thing?   I've seen
what happens when o2net thinks a node as been idle for too long due to
network issues and fences the offending node, and when SAN problems  
cause
DLM to evict a node that stops heartbeating to disk.  There is usually  
lots
of log evidence to tell me what's going on, but in this case there was
nothing.  It was instant, with no warning except those two words on the
console.  I was connected to the iLO and watching the console to try and
catch what popped up.  And Node B fencing itself the same way is a
puzzle and makes no sense at all as everything was up and running at
that point.

However, I've been meaning to setup netconsole, so I'll do that tomorrow
(friday) and make sure that dmesg is at logging level 8.  It's going to
be a quiet day, so I can repeat these tests a few times and will post  
back
anything new that I find.

I'm aware this might not be an OCFS2 issue at all. So I'll also try
unmounting all OCFS2 volumes first and stopping o2cb before repeating
a test.

About the "eeeeee" msgs.  I noticed during a shutdown halt that OCFS2
prints a "cccc" for filesystems it manages to unmount and "eeeeee" for
filesystems that it can't.  Most likely because files are still open by
running processes.  I believe I know which one is causing it and need to
adjust the shutdown order.

John


On 13 Aug 2009, at 18:52, Sunil Mushran wrote:

> This is a feature. ;)
>
> If you have mounted a volume on two or more nodes, the expectation
> is that the private interconnect will always remain up. If you  
> shutdown
> the network on a node, the cluster stack will have to kill a node. It
> does so inorder to prevent hangs in cluster operations.
>
> In a 2 node setup, the higher node number will fence. I would imagine
> Node A is the higher number. But I am not sure why Node B fenced on
> restart. The "eeeeeee" message does not ring a bell.
>
> If you want to get to the bottom of this, setup a netconsole server to
> capture the logs.
>
> Or, remember to shut down the cluster before switching to single
> user mode.
>
> Sunil





More information about the Ocfs2-users mailing list