[Ocfs2-users] VM node won't talk to host

Bret Baptist bbaptist at iexposure.com
Tue Aug 26 15:17:59 PDT 2008


I mounted the volume on the host server first.  I watched the heartbeat 
debugging.  After the mount on the host I saw it doing a heartbeat on the 
device.  Kernel logs from mounting the device:
[112893.823300] ocfs2_dlm: Nodes in domain 
("2CE50B6318E44D21B18F0A7B93CA27FC"): 1
[112893.895672] kjournald starting.  Commit interval 5 seconds
[112893.896247] ocfs2: Mounting device (152,12) on (node 1, slot 0) with 
ordered data mode.

I then mounted the same device mapped into the VM using the KVM emulated IDE 
disk type and showing up in the VM as a SATA drive.  I was able to mount the 
drive and the VM thought it was the only cluster member mounting the device:
[ 2706.845601] ocfs2_dlm: Nodes in domain 
("2CE50B6318E44D21B18F0A7B93CA27FC"): 2
[ 2706.848441] kjournald starting.  Commit interval 5 seconds
[ 2706.849692] ocfs2: Mounting device (8,28) on (node 2, slot 0) with ordered 
data mode.

The debugfs.ocfs2 on the host server (node 1), the first to mount the device, 
started showing the VM heartbeating on the device.  Then this kernel message 
was displayed:
[111004.800566] (5732,1):o2net_connect_expired:1560 ERROR: no connection 
established with node 2 after 10.0 seconds, giving up and returning errors.

debugfs.ocfs2 on the VM never showed the host server heartbeating on the 
device at all.  Also on the VM (node 2), I received no message about it not 
being able to establish a connection.

From what I can tell the VM is not even recognizing that the host server is 
heartbeating on the device.

You say check the device, I know for a fact that the device is working fine.  
I can connect to the device using the AOE protocol over ethernet on the VM and 
everything works like I expect it.  It is just when I map the device into the 
VM using the KVM emulated IDE disk type that I have issues.  Is there any 
other debugging we can do to figure out why this would be?

Just a note, I also tried accessing the device in the VM using the KVM 
paravirtualized block io driver (virtio_blk).  I received the exact same 
results.


Thank you very much for your help.


Bret.

On Monday 25 August 2008 19:34:55 Sunil Mushran wrote:
> No, the device names have nothing to do.
>
> When you mount, mount.ocfs2 kicks off the heartbeat. When
> other nodes see a new node heartbeating, o2net attempts to
> connect to the node. That connect is necessary for the mount
> to succeed.
>
> My investigation would start with disk heartbeat.
>
> # watch -d -n2 "debugfs.ocfs2 -R \"hb\" /dev/sdX "
>
> Do this on the node that has it mounted. You should see your node
> heartbeating. When you mount on the other node, you should see
> that other node heartbeating. If not, check the device.
>
> Sunil
>
> Bret Baptist wrote:
> > Turns out that you DO have to have the same device name on all nodes. 
> > Even though the UUID is the same.  I pushed the network card for AOE on
> > the host into the VM and used the same device names.  Like magic OCFS2 on
> > the VM starts talking to the host.  That seems like a pretty serious
> > limitation to me.
> >
> > Does anyone with some knowledge of the code have any input on this short
> > coming?
> >
> >
> > Thank you.
> >
> >
> > Bret.
> >
> > On Thursday 21 August 2008 16:37:50 Bret Baptist wrote:
> >> The host servers are also able to connect to the VM server.
> >>
> >> Here is the cluster.conf:
> >> node:
> >>         ip_port = 7777
> >>         ip_address = 10.1.1.20
> >>         number = 0
> >>         name = wedge
> >>         cluster = iecluster
> >>
> >> node:
> >>         ip_port = 7777
> >>         ip_address = 10.1.1.21
> >>         number = 1
> >>         name = porkins
> >>         cluster = iecluster
> >>
> >> node:
> >>         ip_port = 7777
> >>         ip_address = 10.1.1.4
> >>         number = 2
> >>         name = opennebula
> >>         cluster = iecluster
> >>
> >> cluster:
> >>         node_count = 3
> >>         name = iecluster
> >>
> >>
> >> The o2cb configuration:
> >> O2CB_HEARTBEAT_THRESHOLD=61
> >> O2CB_IDLE_TIMEOUT_MS=10000
> >> O2CB_KEEPALIVE_DELAY_MS=5000
> >> O2CB_RECONNECT_DELAY_MS=2000
> >>
> >>
> >> I have the VM connecting to a bridge that is on the host server, in this
> >> case 10.1.1.21 is assigned to the bridge br1, the VM opennebula has an
> >> IP address of 10.1.1.4 on this bridge as well.
> >>
> >> Let me know if there is any other details of the set up you would need
> >> to know.
> >>
> >>
> >> Thank you very much for the help.
> >>
> >>
> >> Bret.
> >>
> >> On Thursday 21 August 2008 14:55:43 Herbert van den Bergh wrote:
> >>> What about from the host server(s) to the VM?  And what does
> >>> cluster.conf look like?
> >>>
> >>> Basically, all nodes need to be able to connect to all others' OCFS2
> >>> port.
> >>>
> >>> Thanks,
> >>> Herbert.
> >>>
> >>> Bret Baptist wrote:
> >>>> On Thursday 21 August 2008 14:37:09 Wessel wrote:
> >>>>> Hello Bret,
> >>>>>
> >>>>> An obvious question, but have you tried disabling the firewall on the
> >>>>> KVM VM? Also, are you able to ping the other two Ubuntu nodes from
> >>>>> the KVM VM?
> >>>>
> >>>> There is no firewall enabled on the VM, in fact iptables is not even
> >>>> installed.
> >>>>
> >>>> I am able to ping and do other communication from the VM to the host
> >>>> server.
> >>>>
> >>>>
> >>>> Bret.
> >>>>
> >>>>> -----Oorspronkelijk bericht-----
> >>>>> Van: ocfs2-users-bounces at oss.oracle.com
> >>>>> [mailto:ocfs2-users-bounces at oss.oracle.com] Namens Bret Baptist
> >>>>> Verzonden: donderdag 21 augustus 2008 21:32
> >>>>> Aan: ocfs2-users at oss.oracle.com
> >>>>> Onderwerp: [Ocfs2-users] VM node won't talk to host
> >>>>>
> >>>>> I am trying to mount the same partition from a KVM ubuntu 8.04.1
> >>>>> virtual machine and on an ubuntu 8.04.1 host server.
> >>>>>
> >>>>> I am able to mount the partition just on fine on two ubuntu host
> >>>>> servers, they
> >>>>> both talk to each other.  The logs on both servers show the other
> >>>>> machine mounting and unmounting the drive.
> >>>>>
> >>>>> However, when I mount the drive in the KVM VM I get no communication
> >>>>> to the host servers.  I have checked with tcpdump and the VM doesn't
> >>>>> even attempt to
> >>>>> talk to the other cluster members.  The VM just mounts the drive like
> >>>>> no one
> >>>>>
> >>>>> else is on the cluster, even though both the other nodes already have
> >>>>> the drive mounted.
> >>>>>
> >>>>> I have checked and rechecked all the settings, the cluster.conf is
> >>>>> the same on
> >>>>> all nodes, the drive haa the same uuid and the same label.  The only
> >>>>> thing that is different is the actual device name.  On the host
> >>>>> servers it is the AOE device '/dev/etherd/e0.1p11', on the VM the
> >>>>> '/dev/etherd/e0.1' device is
> >>>>>
> >>>>> mapped to '/dev/sdb' so the OCFS2 partition shows up as '/dev/sdb11'
> >>>>>
> >>>>> The only thing I can think of is that the device names have to be the
> >>>>> same between all hosts, but that really doesn't make any sense to me.
> >>>>> Any help would be greatly appreciated.
> >>>>>
> >>>>>
> >>>>> Thanks.

-- 
Bret Baptist
Senior Network Administrator
bbaptist at iexposure.com
Internet Exposure, Inc.
http://www.iexposure.com
(612)676-1946 x17

Providing Internet Services since 1995
Web Development ~ Search Engine Marketing ~ Web Analytics
Network Security ~ On Demand Tech Support ~ E-Mail Marketing



More information about the Ocfs2-users mailing list