[Ocfs2-tools-users] heartbeat issue with ocfs2 on debian
Joel Becker
Joel.Becker at oracle.com
Mon Nov 16 11:35:10 PST 2009
On Mon, Nov 16, 2009 at 12:37:52PM +0200, Dameon Wagner wrote:
> I'm not sure how many people are still on this list, as the archive
> doesn't show there being much activity. I was going to lurk for a
> little while, but there've been no messages since I joined, so here
> goes.
Most folks use ocfs2-users for this sort of question, which is
why you haven't seen much activity. But we're happy to help anywhere
:-)
> My setup is pretty simple, using only one physical box running debian
> lenny. That box has a xen virtual machine that I'd like to share a
> block device with, also running debian lenny.
>
> The physical box is publishing a LVM2 logical volume using AOE, and
> both systems are mounting the ocfs2 formatted partition on /mnt/ocfs.
I'm not sure I understand your setup. You have a Xen dom0 on a
physical box. Then you have a single domU guest. You have a LVM2
volume on the dom0. That volume is exported via AOE. You are mounting
the volume on the dom0 and the domU so that they can share the
filesystem. Is that correct?
Big question: is the dom0 mounting the LVM2 volume via AOE or
via direct block device access?
> All seems to work OK, and running simple commands like `ls` and `cat`
> work nicely, but I first noticed something was off when any edits to a
> file didn't propagate from one node to the other. Looking in syslog
> of the physical host I see kernel entries mentioning:
>
> (27275,0):o2hb_do_disk_heartbeat:762 ERROR: Device "dm-12": another node is heartbeating in our slot
>
> usually near or followed by:
>
> (26956,0):o2net_connect_expired:1629 ERROR: no connection established with node 1 after 30.0 seconds, giving up and returning errors.
>
> o2cb status on both nodes shows that heartbeat is active -- am I
> missing a configuration option somewhere that will give each node it's
> own slot? The list archive doens't seem to mention this, and with the
> terms I've tried searching on Google doesn't seem to dig anything up
> either.
Slots are autoconfigured, so you're not missing anything there.
You're missing something else that we need to track down. You are
having trouble with network connectivity between the dom0 and the domU.
Is your /etc/ocfs2/cluster.conf correct between them?
But the bigger problem is the 'another node is in our slot'
error. This signifies inconsistency in how the disk is seen. This is
why I ask about AOE vs direct access. We need to make sure that changes
to the disk show up immediately to both parties. Then they should see
each other on the disk and choose slots correctly.
> Any pointers? Or am I just trying to do the wrong thing (I did see
> somewhere that ocfs was more for oracle DB usage, rather than as a
> general purpose filesystem)?
ocfs2 is a general purpose filesystem. ocfs (without the '2')
was not, but that sucker is back in the world of linux 2.4.
Joel
--
"Hey mister if you're gonna walk on water,
Could you drop a line my way?"
Joel Becker
Principal Software Developer
Oracle
E-mail: joel.becker at oracle.com
Phone: (650) 506-8127
More information about the Ocfs2-tools-users
mailing list