[Ocfs2-users] VM node won't talk to host
Herbert van den Bergh
herbert.van.den.bergh at oracle.com
Fri Aug 29 17:39:43 PDT 2008
Bret Baptist wrote:
> On Thursday 28 August 2008 18:59:07 Sunil Mushran wrote:
>
>> If the VM is not seeing the host heartbeat, the issue is not
>> with heartbeat, but the fact that the VM ios are not hitting
>> the actual device. Buffered? See if there is some way
>> to disable buffering in the kvm emulated ide disk.
>>
>
> I was thinking we were on to something here. However I tried mounting an
> OCFS2 file system between two KVM VMs on the host server, accessing the AOE
> partition through IDE emulation, and everything worked exactly like I would
> expect.
>
> Color me really confused as to why mounting the OCFS2 disk on the host server
> and a VM running off that server would not work.
>
Buffering in KVM? So both KVM guests see what's in the KVM buffer, but
it doesn't hit the disk, and disk updates don't make it into the KVM buffer?
Herbert.
>
> Bret.
>
>> Bret Baptist wrote:
>>
>>> I mounted the volume on the host server first. I watched the heartbeat
>>> debugging. After the mount on the host I saw it doing a heartbeat on the
>>> device. Kernel logs from mounting the device:
>>> [112893.823300] ocfs2_dlm: Nodes in domain
>>> ("2CE50B6318E44D21B18F0A7B93CA27FC"): 1
>>> [112893.895672] kjournald starting. Commit interval 5 seconds
>>> [112893.896247] ocfs2: Mounting device (152,12) on (node 1, slot 0) with
>>> ordered data mode.
>>>
>>> I then mounted the same device mapped into the VM using the KVM emulated
>>> IDE disk type and showing up in the VM as a SATA drive. I was able to
>>> mount the drive and the VM thought it was the only cluster member
>>> mounting the device: [ 2706.845601] ocfs2_dlm: Nodes in domain
>>> ("2CE50B6318E44D21B18F0A7B93CA27FC"): 2
>>> [ 2706.848441] kjournald starting. Commit interval 5 seconds
>>> [ 2706.849692] ocfs2: Mounting device (8,28) on (node 2, slot 0) with
>>> ordered data mode.
>>>
>>> The debugfs.ocfs2 on the host server (node 1), the first to mount the
>>> device, started showing the VM heartbeating on the device. Then this
>>> kernel message was displayed:
>>> [111004.800566] (5732,1):o2net_connect_expired:1560 ERROR: no connection
>>> established with node 2 after 10.0 seconds, giving up and returning
>>> errors.
>>>
>>> debugfs.ocfs2 on the VM never showed the host server heartbeating on the
>>> device at all. Also on the VM (node 2), I received no message about it
>>> not being able to establish a connection.
>>>
>>> >From what I can tell the VM is not even recognizing that the host server
>>>
>>>> is
>>>>
>>> heartbeating on the device.
>>>
>>> You say check the device, I know for a fact that the device is working
>>> fine. I can connect to the device using the AOE protocol over ethernet on
>>> the VM and everything works like I expect it. It is just when I map the
>>> device into the VM using the KVM emulated IDE disk type that I have
>>> issues. Is there any other debugging we can do to figure out why this
>>> would be?
>>>
>>> Just a note, I also tried accessing the device in the VM using the KVM
>>> paravirtualized block io driver (virtio_blk). I received the exact same
>>> results.
>>>
>>>
>>> Thank you very much for your help.
>>>
>>>
>>> Bret.
>>>
>>> On Monday 25 August 2008 19:34:55 Sunil Mushran wrote:
>>>
>>>> No, the device names have nothing to do.
>>>>
>>>> When you mount, mount.ocfs2 kicks off the heartbeat. When
>>>> other nodes see a new node heartbeating, o2net attempts to
>>>> connect to the node. That connect is necessary for the mount
>>>> to succeed.
>>>>
>>>> My investigation would start with disk heartbeat.
>>>>
>>>> # watch -d -n2 "debugfs.ocfs2 -R \"hb\" /dev/sdX "
>>>>
>>>> Do this on the node that has it mounted. You should see your node
>>>> heartbeating. When you mount on the other node, you should see
>>>> that other node heartbeating. If not, check the device.
>>>>
>>>> Sunil
>>>>
>>>> Bret Baptist wrote:
>>>>
>>>>> Turns out that you DO have to have the same device name on all nodes.
>>>>> Even though the UUID is the same. I pushed the network card for AOE on
>>>>> the host into the VM and used the same device names. Like magic OCFS2
>>>>> on the VM starts talking to the host. That seems like a pretty serious
>>>>> limitation to me.
>>>>>
>>>>> Does anyone with some knowledge of the code have any input on this
>>>>> short coming?
>>>>>
>>>>>
>>>>> Thank you.
>>>>>
>>>>>
>>>>> Bret.
>>>>>
>>>>> On Thursday 21 August 2008 16:37:50 Bret Baptist wrote:
>>>>>
>>>>>> The host servers are also able to connect to the VM server.
>>>>>>
>>>>>> Here is the cluster.conf:
>>>>>> node:
>>>>>> ip_port = 7777
>>>>>> ip_address = 10.1.1.20
>>>>>> number = 0
>>>>>> name = wedge
>>>>>> cluster = iecluster
>>>>>>
>>>>>> node:
>>>>>> ip_port = 7777
>>>>>> ip_address = 10.1.1.21
>>>>>> number = 1
>>>>>> name = porkins
>>>>>> cluster = iecluster
>>>>>>
>>>>>> node:
>>>>>> ip_port = 7777
>>>>>> ip_address = 10.1.1.4
>>>>>> number = 2
>>>>>> name = opennebula
>>>>>> cluster = iecluster
>>>>>>
>>>>>> cluster:
>>>>>> node_count = 3
>>>>>> name = iecluster
>>>>>>
>>>>>>
>>>>>> The o2cb configuration:
>>>>>> O2CB_HEARTBEAT_THRESHOLD=61
>>>>>> O2CB_IDLE_TIMEOUT_MS=10000
>>>>>> O2CB_KEEPALIVE_DELAY_MS=5000
>>>>>> O2CB_RECONNECT_DELAY_MS=2000
>>>>>>
>>>>>>
>>>>>> I have the VM connecting to a bridge that is on the host server, in
>>>>>> this case 10.1.1.21 is assigned to the bridge br1, the VM opennebula
>>>>>> has an IP address of 10.1.1.4 on this bridge as well.
>>>>>>
>>>>>> Let me know if there is any other details of the set up you would need
>>>>>> to know.
>>>>>>
>>>>>>
>>>>>> Thank you very much for the help.
>>>>>>
>>>>>>
>>>>>> Bret.
>>>>>>
>>>>>> On Thursday 21 August 2008 14:55:43 Herbert van den Bergh wrote:
>>>>>>
>>>>>>> What about from the host server(s) to the VM? And what does
>>>>>>> cluster.conf look like?
>>>>>>>
>>>>>>> Basically, all nodes need to be able to connect to all others' OCFS2
>>>>>>> port.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Herbert.
>>>>>>>
>>>>>>> Bret Baptist wrote:
>>>>>>>
>>>>>>>> On Thursday 21 August 2008 14:37:09 Wessel wrote:
>>>>>>>>
>>>>>>>>> Hello Bret,
>>>>>>>>>
>>>>>>>>> An obvious question, but have you tried disabling the firewall on
>>>>>>>>> the KVM VM? Also, are you able to ping the other two Ubuntu nodes
>>>>>>>>> from the KVM VM?
>>>>>>>>>
>>>>>>>> There is no firewall enabled on the VM, in fact iptables is not even
>>>>>>>> installed.
>>>>>>>>
>>>>>>>> I am able to ping and do other communication from the VM to the host
>>>>>>>> server.
>>>>>>>>
>>>>>>>>
>>>>>>>> Bret.
>>>>>>>>
>>>>>>>>
>>>>>>>>> -----Oorspronkelijk bericht-----
>>>>>>>>> Van: ocfs2-users-bounces at oss.oracle.com
>>>>>>>>> [mailto:ocfs2-users-bounces at oss.oracle.com] Namens Bret Baptist
>>>>>>>>> Verzonden: donderdag 21 augustus 2008 21:32
>>>>>>>>> Aan: ocfs2-users at oss.oracle.com
>>>>>>>>> Onderwerp: [Ocfs2-users] VM node won't talk to host
>>>>>>>>>
>>>>>>>>> I am trying to mount the same partition from a KVM ubuntu 8.04.1
>>>>>>>>> virtual machine and on an ubuntu 8.04.1 host server.
>>>>>>>>>
>>>>>>>>> I am able to mount the partition just on fine on two ubuntu host
>>>>>>>>> servers, they
>>>>>>>>> both talk to each other. The logs on both servers show the other
>>>>>>>>> machine mounting and unmounting the drive.
>>>>>>>>>
>>>>>>>>> However, when I mount the drive in the KVM VM I get no
>>>>>>>>> communication to the host servers. I have checked with tcpdump and
>>>>>>>>> the VM doesn't even attempt to
>>>>>>>>> talk to the other cluster members. The VM just mounts the drive
>>>>>>>>> like no one
>>>>>>>>>
>>>>>>>>> else is on the cluster, even though both the other nodes already
>>>>>>>>> have the drive mounted.
>>>>>>>>>
>>>>>>>>> I have checked and rechecked all the settings, the cluster.conf is
>>>>>>>>> the same on
>>>>>>>>> all nodes, the drive haa the same uuid and the same label. The
>>>>>>>>> only thing that is different is the actual device name. On the
>>>>>>>>> host servers it is the AOE device '/dev/etherd/e0.1p11', on the VM
>>>>>>>>> the '/dev/etherd/e0.1' device is
>>>>>>>>>
>>>>>>>>> mapped to '/dev/sdb' so the OCFS2 partition shows up as
>>>>>>>>> '/dev/sdb11'
>>>>>>>>>
>>>>>>>>> The only thing I can think of is that the device names have to be
>>>>>>>>> the same between all hosts, but that really doesn't make any sense
>>>>>>>>> to me. Any help would be greatly appreciated.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>
>
More information about the Ocfs2-users
mailing list