[Ocfs2-users] VM node won't talk to host

Herbert van den Bergh herbert.van.den.bergh at oracle.com
Fri Aug 29 17:39:43 PDT 2008



Bret Baptist wrote:
> On Thursday 28 August 2008 18:59:07 Sunil Mushran wrote:
>   
>> If the VM is not seeing the host heartbeat, the issue is not
>> with heartbeat, but the fact that the VM ios are not hitting
>> the actual device. Buffered? See if there is some way
>> to disable buffering in the kvm emulated ide disk.
>>     
>
> I was thinking we were on to something here.  However I tried mounting an 
> OCFS2 file system between two KVM VMs on the host server, accessing the AOE 
> partition through IDE emulation, and everything worked exactly like I would 
> expect.
>
> Color me really confused as to why mounting the OCFS2 disk on the host server 
> and a VM running off that server would not work.
>   
Buffering in KVM?  So both KVM guests see what's in the KVM buffer, but 
it doesn't hit the disk, and disk updates don't make it into the KVM buffer?

Herbert.
>
> Bret.
>   
>> Bret Baptist wrote:
>>     
>>> I mounted the volume on the host server first.  I watched the heartbeat
>>> debugging.  After the mount on the host I saw it doing a heartbeat on the
>>> device.  Kernel logs from mounting the device:
>>> [112893.823300] ocfs2_dlm: Nodes in domain
>>> ("2CE50B6318E44D21B18F0A7B93CA27FC"): 1
>>> [112893.895672] kjournald starting.  Commit interval 5 seconds
>>> [112893.896247] ocfs2: Mounting device (152,12) on (node 1, slot 0) with
>>> ordered data mode.
>>>
>>> I then mounted the same device mapped into the VM using the KVM emulated
>>> IDE disk type and showing up in the VM as a SATA drive.  I was able to
>>> mount the drive and the VM thought it was the only cluster member
>>> mounting the device: [ 2706.845601] ocfs2_dlm: Nodes in domain
>>> ("2CE50B6318E44D21B18F0A7B93CA27FC"): 2
>>> [ 2706.848441] kjournald starting.  Commit interval 5 seconds
>>> [ 2706.849692] ocfs2: Mounting device (8,28) on (node 2, slot 0) with
>>> ordered data mode.
>>>
>>> The debugfs.ocfs2 on the host server (node 1), the first to mount the
>>> device, started showing the VM heartbeating on the device.  Then this
>>> kernel message was displayed:
>>> [111004.800566] (5732,1):o2net_connect_expired:1560 ERROR: no connection
>>> established with node 2 after 10.0 seconds, giving up and returning
>>> errors.
>>>
>>> debugfs.ocfs2 on the VM never showed the host server heartbeating on the
>>> device at all.  Also on the VM (node 2), I received no message about it
>>> not being able to establish a connection.
>>>
>>> >From what I can tell the VM is not even recognizing that the host server
>>>       
>>>> is
>>>>         
>>> heartbeating on the device.
>>>
>>> You say check the device, I know for a fact that the device is working
>>> fine. I can connect to the device using the AOE protocol over ethernet on
>>> the VM and everything works like I expect it.  It is just when I map the
>>> device into the VM using the KVM emulated IDE disk type that I have
>>> issues.  Is there any other debugging we can do to figure out why this
>>> would be?
>>>
>>> Just a note, I also tried accessing the device in the VM using the KVM
>>> paravirtualized block io driver (virtio_blk).  I received the exact same
>>> results.
>>>
>>>
>>> Thank you very much for your help.
>>>
>>>
>>> Bret.
>>>
>>> On Monday 25 August 2008 19:34:55 Sunil Mushran wrote:
>>>       
>>>> No, the device names have nothing to do.
>>>>
>>>> When you mount, mount.ocfs2 kicks off the heartbeat. When
>>>> other nodes see a new node heartbeating, o2net attempts to
>>>> connect to the node. That connect is necessary for the mount
>>>> to succeed.
>>>>
>>>> My investigation would start with disk heartbeat.
>>>>
>>>> # watch -d -n2 "debugfs.ocfs2 -R \"hb\" /dev/sdX "
>>>>
>>>> Do this on the node that has it mounted. You should see your node
>>>> heartbeating. When you mount on the other node, you should see
>>>> that other node heartbeating. If not, check the device.
>>>>
>>>> Sunil
>>>>
>>>> Bret Baptist wrote:
>>>>         
>>>>> Turns out that you DO have to have the same device name on all nodes.
>>>>> Even though the UUID is the same.  I pushed the network card for AOE on
>>>>> the host into the VM and used the same device names.  Like magic OCFS2
>>>>> on the VM starts talking to the host.  That seems like a pretty serious
>>>>> limitation to me.
>>>>>
>>>>> Does anyone with some knowledge of the code have any input on this
>>>>> short coming?
>>>>>
>>>>>
>>>>> Thank you.
>>>>>
>>>>>
>>>>> Bret.
>>>>>
>>>>> On Thursday 21 August 2008 16:37:50 Bret Baptist wrote:
>>>>>           
>>>>>> The host servers are also able to connect to the VM server.
>>>>>>
>>>>>> Here is the cluster.conf:
>>>>>> node:
>>>>>>         ip_port = 7777
>>>>>>         ip_address = 10.1.1.20
>>>>>>         number = 0
>>>>>>         name = wedge
>>>>>>         cluster = iecluster
>>>>>>
>>>>>> node:
>>>>>>         ip_port = 7777
>>>>>>         ip_address = 10.1.1.21
>>>>>>         number = 1
>>>>>>         name = porkins
>>>>>>         cluster = iecluster
>>>>>>
>>>>>> node:
>>>>>>         ip_port = 7777
>>>>>>         ip_address = 10.1.1.4
>>>>>>         number = 2
>>>>>>         name = opennebula
>>>>>>         cluster = iecluster
>>>>>>
>>>>>> cluster:
>>>>>>         node_count = 3
>>>>>>         name = iecluster
>>>>>>
>>>>>>
>>>>>> The o2cb configuration:
>>>>>> O2CB_HEARTBEAT_THRESHOLD=61
>>>>>> O2CB_IDLE_TIMEOUT_MS=10000
>>>>>> O2CB_KEEPALIVE_DELAY_MS=5000
>>>>>> O2CB_RECONNECT_DELAY_MS=2000
>>>>>>
>>>>>>
>>>>>> I have the VM connecting to a bridge that is on the host server, in
>>>>>> this case 10.1.1.21 is assigned to the bridge br1, the VM opennebula
>>>>>> has an IP address of 10.1.1.4 on this bridge as well.
>>>>>>
>>>>>> Let me know if there is any other details of the set up you would need
>>>>>> to know.
>>>>>>
>>>>>>
>>>>>> Thank you very much for the help.
>>>>>>
>>>>>>
>>>>>> Bret.
>>>>>>
>>>>>> On Thursday 21 August 2008 14:55:43 Herbert van den Bergh wrote:
>>>>>>             
>>>>>>> What about from the host server(s) to the VM?  And what does
>>>>>>> cluster.conf look like?
>>>>>>>
>>>>>>> Basically, all nodes need to be able to connect to all others' OCFS2
>>>>>>> port.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Herbert.
>>>>>>>
>>>>>>> Bret Baptist wrote:
>>>>>>>               
>>>>>>>> On Thursday 21 August 2008 14:37:09 Wessel wrote:
>>>>>>>>                 
>>>>>>>>> Hello Bret,
>>>>>>>>>
>>>>>>>>> An obvious question, but have you tried disabling the firewall on
>>>>>>>>> the KVM VM? Also, are you able to ping the other two Ubuntu nodes
>>>>>>>>> from the KVM VM?
>>>>>>>>>                   
>>>>>>>> There is no firewall enabled on the VM, in fact iptables is not even
>>>>>>>> installed.
>>>>>>>>
>>>>>>>> I am able to ping and do other communication from the VM to the host
>>>>>>>> server.
>>>>>>>>
>>>>>>>>
>>>>>>>> Bret.
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> -----Oorspronkelijk bericht-----
>>>>>>>>> Van: ocfs2-users-bounces at oss.oracle.com
>>>>>>>>> [mailto:ocfs2-users-bounces at oss.oracle.com] Namens Bret Baptist
>>>>>>>>> Verzonden: donderdag 21 augustus 2008 21:32
>>>>>>>>> Aan: ocfs2-users at oss.oracle.com
>>>>>>>>> Onderwerp: [Ocfs2-users] VM node won't talk to host
>>>>>>>>>
>>>>>>>>> I am trying to mount the same partition from a KVM ubuntu 8.04.1
>>>>>>>>> virtual machine and on an ubuntu 8.04.1 host server.
>>>>>>>>>
>>>>>>>>> I am able to mount the partition just on fine on two ubuntu host
>>>>>>>>> servers, they
>>>>>>>>> both talk to each other.  The logs on both servers show the other
>>>>>>>>> machine mounting and unmounting the drive.
>>>>>>>>>
>>>>>>>>> However, when I mount the drive in the KVM VM I get no
>>>>>>>>> communication to the host servers.  I have checked with tcpdump and
>>>>>>>>> the VM doesn't even attempt to
>>>>>>>>> talk to the other cluster members.  The VM just mounts the drive
>>>>>>>>> like no one
>>>>>>>>>
>>>>>>>>> else is on the cluster, even though both the other nodes already
>>>>>>>>> have the drive mounted.
>>>>>>>>>
>>>>>>>>> I have checked and rechecked all the settings, the cluster.conf is
>>>>>>>>> the same on
>>>>>>>>> all nodes, the drive haa the same uuid and the same label.  The
>>>>>>>>> only thing that is different is the actual device name.  On the
>>>>>>>>> host servers it is the AOE device '/dev/etherd/e0.1p11', on the VM
>>>>>>>>> the '/dev/etherd/e0.1' device is
>>>>>>>>>
>>>>>>>>> mapped to '/dev/sdb' so the OCFS2 partition shows up as
>>>>>>>>> '/dev/sdb11'
>>>>>>>>>
>>>>>>>>> The only thing I can think of is that the device names have to be
>>>>>>>>> the same between all hosts, but that really doesn't make any sense
>>>>>>>>> to me. Any help would be greatly appreciated.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>                   
>
>   



More information about the Ocfs2-users mailing list