[Ocfs2-tools-users] heartbeat issue with ocfs2 on debian
Dameon Wagner
d.wagner at ru.ac.za
Tue Nov 17 02:29:49 PST 2009
On Mon, Nov 16, 2009 at 11:35:10AM -0800, Joel Becker scribbled in
"Re: [Ocfs2-tools-users] heartbeat issue with ocfs2 on debian":
> On Mon, Nov 16, 2009 at 12:37:52PM +0200, Dameon Wagner wrote:
> > I'm not sure how many people are still on this list, as the
> > archive doesn't show there being much activity. I was going to
> > lurk for a little while, but there've been no messages since I
> > joined, so here goes.
>
> Most folks use ocfs2-users for this sort of question, which is why
> you haven't seen much activity. But we're happy to help anywhere
> :-)
Ahh, that could explain it ;-) I'll probably move subscriptions in a
while, and follow what's going on over there.
> > My setup is pretty simple, using only one physical box running
> > debian lenny. That box has a xen virtual machine that I'd like to
> > share a block device with, also running debian lenny.
> >
> > The physical box is publishing a LVM2 logical volume using AOE,
> > and both systems are mounting the ocfs2 formatted partition on
> > /mnt/ocfs.
>
> I'm not sure I understand your setup. You have a Xen dom0 on a
> physical box. Then you have a single domU guest. You have a LVM2
> volume on the dom0. That volume is exported via AOE. You are
> mounting the volume on the dom0 and the domU so that they can share
> the filesystem. Is that correct?
Yup, exactly right.
> Big question: is the dom0 mounting the LVM2 volume via AOE or via
> direct block device access?
I am/was mounting the LVM2 volume directly on the dom0, and via AOE on
the domU. I had originally wanted to mount them both as AOE as that
is probably how I will move it into production (simple storage box
that won't consume any of the volumes it publishes via AOE, and
"remote" boxes that actually mount the volumes). However, the
aoe-tools/vblade setup I have didn't seem to make the AOE volume
available to the dom0, so I figured I'd just mount the LV. I honestly
just thought that as a block device it wouldn't make a difference.
I've been playing around a little this morning, trying to get both
dom0 and domU to mount via AOE, and it seems that I can only get dom0
to see the aoe device if I use `vblade .. .. lo <vol>` rather than
`vblade .. .. eth0 <vol>`, which is annoying, but a matter for another
mailing list I think...
Anyway, long-story-short, using vblade twice on the same LV, once for
lo, and once for eth0, seems to have solved the issue, and given me
_exactly_ what I was after -- quick test edits on one host show up
(effectively) immediately on the other host, and no errors appearing
in either hosts logfiles. In other words "thanks Joel!"
All I have to do now is work out a neater way of publishing the aoe
device to both hosts, without having two instances of vblade running.
> > All seems to work OK, and running simple commands like `ls` and
> > `cat` work nicely, but I first noticed something was off when any
> > edits to a file didn't propagate from one node to the other.
> > Looking in syslog of the physical host I see kernel entries
> > mentioning:
> >
> > (27275,0):o2hb_do_disk_heartbeat:762 ERROR: Device "dm-12": another node is heartbeating in our slot
> >
> > usually near or followed by:
> >
> > (26956,0):o2net_connect_expired:1629 ERROR: no connection established with node 1 after 30.0 seconds, giving up and returning errors.
> >
> > o2cb status on both nodes shows that heartbeat is active -- am I
> > missing a configuration option somewhere that will give each node
> > it's own slot? The list archive doens't seem to mention this, and
> > with the terms I've tried searching on Google doesn't seem to dig
> > anything up either.
>
> Slots are autoconfigured, so you're not missing anything there.
> You're missing something else that we need to track down. You are
> having trouble with network connectivity between the dom0 and the
> domU. Is your /etc/ocfs2/cluster.conf correct between them? But
> the bigger problem is the 'another node is in our slot' error. This
> signifies inconsistency in how the disk is seen. This is why I ask
> about AOE vs direct access. We need to make sure that changes to
> the disk show up immediately to both parties. Then they should see
> each other on the disk and choose slots correctly.
My cluster.conf files are copy-pasted so, if I understand correctly,
should be compatible. Besides, with both boxes connecting to the
block device via AOE all seems to be sorted. I honestly didn't think
that that would make a difference, but it seems it does.
> > Any pointers? Or am I just trying to do the wrong thing (I did
> > see somewhere that ocfs was more for oracle DB usage, rather than
> > as a general purpose filesystem)?
>
> ocfs2 is a general purpose filesystem. ocfs (without the '2') was
> not, but that sucker is back in the world of linux 2.4.
Cool, good to know. Any idea when ACLs will be working? (Probably
answered in the ocfs2-users archive, which I'm trawling through at the
moment).
Thanks again.
Dameon
--
><> ><> ><> ><> ><> ><> ooOoo <>< <>< <>< <>< <>< <><
Dr. Dameon Wagner,
Senior ICT Specialist,
Depts. of Computer Science & Information Systems,
Rhodes University, Grahamstown, South Africa.
><> ><> ><> ><> ><> ><> ooOoo <>< <>< <>< <>< <>< <><
More information about the Ocfs2-tools-users
mailing list