[Ocfs2-users] VM node won't talk to host

Bret Baptist bbaptist at iexposure.com
Fri Aug 29 16:38:08 PDT 2008


On Thursday 28 August 2008 18:59:07 Sunil Mushran wrote:
> If the VM is not seeing the host heartbeat, the issue is not
> with heartbeat, but the fact that the VM ios are not hitting
> the actual device. Buffered? See if there is some way
> to disable buffering in the kvm emulated ide disk.

I was thinking we were on to something here.  However I tried mounting an 
OCFS2 file system between two KVM VMs on the host server, accessing the AOE 
partition through IDE emulation, and everything worked exactly like I would 
expect.

Color me really confused as to why mounting the OCFS2 disk on the host server 
and a VM running off that server would not work.


Bret.
>
> Bret Baptist wrote:
> > I mounted the volume on the host server first.  I watched the heartbeat
> > debugging.  After the mount on the host I saw it doing a heartbeat on the
> > device.  Kernel logs from mounting the device:
> > [112893.823300] ocfs2_dlm: Nodes in domain
> > ("2CE50B6318E44D21B18F0A7B93CA27FC"): 1
> > [112893.895672] kjournald starting.  Commit interval 5 seconds
> > [112893.896247] ocfs2: Mounting device (152,12) on (node 1, slot 0) with
> > ordered data mode.
> >
> > I then mounted the same device mapped into the VM using the KVM emulated
> > IDE disk type and showing up in the VM as a SATA drive.  I was able to
> > mount the drive and the VM thought it was the only cluster member
> > mounting the device: [ 2706.845601] ocfs2_dlm: Nodes in domain
> > ("2CE50B6318E44D21B18F0A7B93CA27FC"): 2
> > [ 2706.848441] kjournald starting.  Commit interval 5 seconds
> > [ 2706.849692] ocfs2: Mounting device (8,28) on (node 2, slot 0) with
> > ordered data mode.
> >
> > The debugfs.ocfs2 on the host server (node 1), the first to mount the
> > device, started showing the VM heartbeating on the device.  Then this
> > kernel message was displayed:
> > [111004.800566] (5732,1):o2net_connect_expired:1560 ERROR: no connection
> > established with node 2 after 10.0 seconds, giving up and returning
> > errors.
> >
> > debugfs.ocfs2 on the VM never showed the host server heartbeating on the
> > device at all.  Also on the VM (node 2), I received no message about it
> > not being able to establish a connection.
> >
> > >From what I can tell the VM is not even recognizing that the host server
> > > is
> >
> > heartbeating on the device.
> >
> > You say check the device, I know for a fact that the device is working
> > fine. I can connect to the device using the AOE protocol over ethernet on
> > the VM and everything works like I expect it.  It is just when I map the
> > device into the VM using the KVM emulated IDE disk type that I have
> > issues.  Is there any other debugging we can do to figure out why this
> > would be?
> >
> > Just a note, I also tried accessing the device in the VM using the KVM
> > paravirtualized block io driver (virtio_blk).  I received the exact same
> > results.
> >
> >
> > Thank you very much for your help.
> >
> >
> > Bret.
> >
> > On Monday 25 August 2008 19:34:55 Sunil Mushran wrote:
> >> No, the device names have nothing to do.
> >>
> >> When you mount, mount.ocfs2 kicks off the heartbeat. When
> >> other nodes see a new node heartbeating, o2net attempts to
> >> connect to the node. That connect is necessary for the mount
> >> to succeed.
> >>
> >> My investigation would start with disk heartbeat.
> >>
> >> # watch -d -n2 "debugfs.ocfs2 -R \"hb\" /dev/sdX "
> >>
> >> Do this on the node that has it mounted. You should see your node
> >> heartbeating. When you mount on the other node, you should see
> >> that other node heartbeating. If not, check the device.
> >>
> >> Sunil
> >>
> >> Bret Baptist wrote:
> >>> Turns out that you DO have to have the same device name on all nodes.
> >>> Even though the UUID is the same.  I pushed the network card for AOE on
> >>> the host into the VM and used the same device names.  Like magic OCFS2
> >>> on the VM starts talking to the host.  That seems like a pretty serious
> >>> limitation to me.
> >>>
> >>> Does anyone with some knowledge of the code have any input on this
> >>> short coming?
> >>>
> >>>
> >>> Thank you.
> >>>
> >>>
> >>> Bret.
> >>>
> >>> On Thursday 21 August 2008 16:37:50 Bret Baptist wrote:
> >>>> The host servers are also able to connect to the VM server.
> >>>>
> >>>> Here is the cluster.conf:
> >>>> node:
> >>>>         ip_port = 7777
> >>>>         ip_address = 10.1.1.20
> >>>>         number = 0
> >>>>         name = wedge
> >>>>         cluster = iecluster
> >>>>
> >>>> node:
> >>>>         ip_port = 7777
> >>>>         ip_address = 10.1.1.21
> >>>>         number = 1
> >>>>         name = porkins
> >>>>         cluster = iecluster
> >>>>
> >>>> node:
> >>>>         ip_port = 7777
> >>>>         ip_address = 10.1.1.4
> >>>>         number = 2
> >>>>         name = opennebula
> >>>>         cluster = iecluster
> >>>>
> >>>> cluster:
> >>>>         node_count = 3
> >>>>         name = iecluster
> >>>>
> >>>>
> >>>> The o2cb configuration:
> >>>> O2CB_HEARTBEAT_THRESHOLD=61
> >>>> O2CB_IDLE_TIMEOUT_MS=10000
> >>>> O2CB_KEEPALIVE_DELAY_MS=5000
> >>>> O2CB_RECONNECT_DELAY_MS=2000
> >>>>
> >>>>
> >>>> I have the VM connecting to a bridge that is on the host server, in
> >>>> this case 10.1.1.21 is assigned to the bridge br1, the VM opennebula
> >>>> has an IP address of 10.1.1.4 on this bridge as well.
> >>>>
> >>>> Let me know if there is any other details of the set up you would need
> >>>> to know.
> >>>>
> >>>>
> >>>> Thank you very much for the help.
> >>>>
> >>>>
> >>>> Bret.
> >>>>
> >>>> On Thursday 21 August 2008 14:55:43 Herbert van den Bergh wrote:
> >>>>> What about from the host server(s) to the VM?  And what does
> >>>>> cluster.conf look like?
> >>>>>
> >>>>> Basically, all nodes need to be able to connect to all others' OCFS2
> >>>>> port.
> >>>>>
> >>>>> Thanks,
> >>>>> Herbert.
> >>>>>
> >>>>> Bret Baptist wrote:
> >>>>>> On Thursday 21 August 2008 14:37:09 Wessel wrote:
> >>>>>>> Hello Bret,
> >>>>>>>
> >>>>>>> An obvious question, but have you tried disabling the firewall on
> >>>>>>> the KVM VM? Also, are you able to ping the other two Ubuntu nodes
> >>>>>>> from the KVM VM?
> >>>>>>
> >>>>>> There is no firewall enabled on the VM, in fact iptables is not even
> >>>>>> installed.
> >>>>>>
> >>>>>> I am able to ping and do other communication from the VM to the host
> >>>>>> server.
> >>>>>>
> >>>>>>
> >>>>>> Bret.
> >>>>>>
> >>>>>>> -----Oorspronkelijk bericht-----
> >>>>>>> Van: ocfs2-users-bounces at oss.oracle.com
> >>>>>>> [mailto:ocfs2-users-bounces at oss.oracle.com] Namens Bret Baptist
> >>>>>>> Verzonden: donderdag 21 augustus 2008 21:32
> >>>>>>> Aan: ocfs2-users at oss.oracle.com
> >>>>>>> Onderwerp: [Ocfs2-users] VM node won't talk to host
> >>>>>>>
> >>>>>>> I am trying to mount the same partition from a KVM ubuntu 8.04.1
> >>>>>>> virtual machine and on an ubuntu 8.04.1 host server.
> >>>>>>>
> >>>>>>> I am able to mount the partition just on fine on two ubuntu host
> >>>>>>> servers, they
> >>>>>>> both talk to each other.  The logs on both servers show the other
> >>>>>>> machine mounting and unmounting the drive.
> >>>>>>>
> >>>>>>> However, when I mount the drive in the KVM VM I get no
> >>>>>>> communication to the host servers.  I have checked with tcpdump and
> >>>>>>> the VM doesn't even attempt to
> >>>>>>> talk to the other cluster members.  The VM just mounts the drive
> >>>>>>> like no one
> >>>>>>>
> >>>>>>> else is on the cluster, even though both the other nodes already
> >>>>>>> have the drive mounted.
> >>>>>>>
> >>>>>>> I have checked and rechecked all the settings, the cluster.conf is
> >>>>>>> the same on
> >>>>>>> all nodes, the drive haa the same uuid and the same label.  The
> >>>>>>> only thing that is different is the actual device name.  On the
> >>>>>>> host servers it is the AOE device '/dev/etherd/e0.1p11', on the VM
> >>>>>>> the '/dev/etherd/e0.1' device is
> >>>>>>>
> >>>>>>> mapped to '/dev/sdb' so the OCFS2 partition shows up as
> >>>>>>> '/dev/sdb11'
> >>>>>>>
> >>>>>>> The only thing I can think of is that the device names have to be
> >>>>>>> the same between all hosts, but that really doesn't make any sense
> >>>>>>> to me. Any help would be greatly appreciated.
> >>>>>>>
> >>>>>>>
> >>>>>>> Thanks.

-- 
Bret Baptist
Senior Network Administrator
bbaptist at iexposure.com
Internet Exposure, Inc.
http://www.iexposure.com
(612)676-1946 x17

Providing Internet Services since 1995
Web Development ~ Search Engine Marketing ~ Web Analytics
Network Security ~ On Demand Tech Support ~ E-Mail Marketing



More information about the Ocfs2-users mailing list