[Ocfs2-users] Catatonic nodes under SLES10
David Miller
syslog at d.sparks.net
Mon Apr 2 09:01:28 PDT 2007
Good afternoon all;
I'm planning on implementing a shared storage solution for a primary and
backup oracle server in the near future.
We can't afford RAC, and we don't have performance or growth issues; we
just want another system to be able to start up and run if the primary
fails.
Both servers will be connected to a dual-host external RAID system.
I've setup ocfs2 on a couple of test systems and everything appears to
work fine.
Until, that is, one of the systems loses network connectivity.
When the systems can't talk to each other anymore, but the disk
heartbeat is still alive, the high numbered node goes catatonic. Under
SLES 9 it fenced itself off with a kernel panic; under 10 it simply
stops responding to network or console. A power cycling is required to
bring it back up.
The desired behavior would be for the higher numbered node to lose
access to the ocfs2 file system(s). I don't really care whether it
would simply timeout ala stale NFS mounts, or immediately error like
access to non-existent files.
I'm running the latest SuSE packaged version of the ocfs-tools package:
saltlake:/proc/fs # cat /proc/fs/ocfs2/version
OCFS2 1.2.1-SLES Tue Apr 25 14:46:36 PDT 2006 (build sles)
saltlake:/proc/fs #
I'm using the stock 10.0 release kernel:
saltlake:/proc/fs # uname -a
Linux saltlake 2.6.16.21-0.8-smp #1 SMP Mon Jul 3 18:25:39 UTC 2006 i686
i686 i386 GNU/Linux
saltlake:/proc/fs #
Is there a solution to this? Is this expected behavior?
Thanks,
--- David
More information about the Ocfs2-users
mailing list