[Ocfs2-users] sanity check - Xen+iSCSI+LVM+OCFS2 at dom0/domU

Alok Dhir adhir at symplicity.com
Thu Feb 7 14:54:35 PST 2008


Thanks again for your prompt assistance earlier today - we seem to  
have gotten past the fs/inode.c bug at domU by using the OSS packaged  
ocfs2 kernel modules.  The cluster comes up and mounts on all boxes,  
and appears to work.

However, we have now run into a more prevalent issue - at dom0, any of  
the cluster member servers will spontaneously reboot when I start an  
'iozone -A' in an ocfs2 filesystem.  I am unable to check the kernel  
panic message as the box reboots immediately, despite the setting of  
'kernel.panic=0' in sysctl (which is supposed to mean 'do not reboot  
on panic').  There are also no entries in messages when this happens.

I realize there's not much debugging you can do without the panic  
message, but I'm wondering if perhaps this new version has some bug  
which was not in 1.2.7 (with our self-built 1.2.7 only domU servers  
rebooted - dom0 were stable).

Are others running this new version with success?  Under RHEL/Centos  
5.1 Xen dom0/domU?

On Feb 7, 2008, at 1:40 PM, Sunil Mushran wrote:

> Is the ip address correct? If not, correct.
>
> # netstat -tan
> See if that port is already in use. If  so, use another.
>
> Alok Dhir wrote:
>> Ah - thanks for the clarification.
>>
>> I'm left with one perplexing problem - on one of the hosts,  
>> 'devxen0', o2cb refuses to start.  The box is identically  
>> configured to at least 2 other cluster hosts and all were imaged  
>> the exact same way, except that devxen0 has 32GB RAM where the  
>> others have 16 or less.
>>
>> Any clues where to look?
>>
>> -- 
>> [root at devxen0:~] service o2cb enable
>> Writing O2CB configuration: OK
>> Starting O2CB cluster ocfs2: Failed
>> Cluster ocfs2 created
>> Node beast added
>> o2cb_ctl: Internal logic failure while adding node devxen0
>>
>> Stopping O2CB cluster ocfs2: OK
>> -- 
>>
>> This is in syslog when this happens:
>>
>> Feb  7 13:26:50 devxen0 kernel: (17194,6):o2net_open_listening_sock: 
>> 1867 ERROR: unable to bind socket at 196.168.1.72:7777, ret=-99
>>
>> -- 
>>
>> Box config:
>>
>> [root at devxen0:~] uname -a
>> Linux devxen0.symplicity.com 2.6.18-53.1.6.el5xen #1 SMP Wed Jan 23  
>> 11:59:21 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
>>
>> -- 
>>
>> Here is cluster.conf:
>>
>> ---
>> node:
>>    ip_port = 7777
>>    ip_address = 192.168.1.62
>>    number = 0
>>    name = beast
>>    cluster = ocfs2
>>
>> node:
>>    ip_port = 7777
>>    ip_address = 196.168.1.72
>>    number = 1
>>    name = devxen0
>>    cluster = ocfs2
>>
>> node:
>>    ip_port = 7777
>>    ip_address = 192.168.1.73
>>    number = 2
>>    name = devxen1
>>    cluster = ocfs2
>>
>> node:
>>    ip_port = 7777
>>    ip_address = 192.168.1.74
>>    number = 3
>>    name = devxen2
>>    cluster = ocfs2
>>
>> node:
>>    ip_port = 7777
>>    ip_address = 192.168.1.70
>>    number = 4
>>    name = fs1
>>    cluster = ocfs2
>>
>> node:
>>    ip_port = 7777
>>    ip_address = 192.168.1.71
>>    number = 5
>>    name = fs2
>>    cluster = ocfs2
>>
>> node:
>>    ip_port = 7777
>>    ip_address = 192.168.1.80
>>    number = 6
>>    name = vdb1
>>    cluster = ocfs2
>>
>> cluster:
>>    node_count = 7
>>    name = ocfs2
>> ---
>>
>>
>>
>> On Feb 7, 2008, at 1:23 PM, Sunil Mushran wrote:
>>
>>> Yes, but backported and released as ocfs2 1.4 which is yet to be  
>>> released.
>>> You are on ocfs2 1.2.
>>>
>>> Alok Dhir wrote:
>>>> I've seen that -- I was under the impression that some of those  
>>>> were being backported into the release kernels.
>>>>
>>>> Thanks,
>>>>
>>>> Alok
>>>>
>>>> On Feb 7, 2008, at 1:15 PM, Sunil Mushran wrote:
>>>>
>>>>> http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2-new-features.html
>>>>>
>>>>> Alok Dhir wrote:
>>>>>> We were indeed using a self-built module due to the lack of an  
>>>>>> OSS one for the latest kernel.  Thanks for your response, I  
>>>>>> will test with the new version.
>>>>>>
>>>>>> What are we leaving on the table by not using the latest  
>>>>>> mainline kernel?
>>>>>>
>>>>>> On Feb 7, 2008, at 12:56 PM, Sunil Mushran wrote:
>>>>>>
>>>>>>> Are you building ocfs2 with this kernel or are using the ones we
>>>>>>> provide for RHEL5?
>>>>>>>
>>>>>>> I am assuming you have built it yourself as we did not release
>>>>>>> packages for the latest 2.6.18-53.1.6 kernel till last night.
>>>>>>>
>>>>>>> If you are using your own, then use the one from oss.
>>>>>>>
>>>>>>> If you are using the one from oss, then file a bugzilla with the
>>>>>>> full oops trace.
>>>>>>>
>>>>>>> Thanks
>>>>>>> Sunil
>>>>>>>
>>>>>>> Alok K. Dhir wrote:
>>>>>>>> Hello all - we're evaluating OCFS2 in our development  
>>>>>>>> environment to see if it meets our needs.
>>>>>>>>
>>>>>>>> We're testing it with an iSCSI storage array (Dell MD3000i)  
>>>>>>>> and 5 servers running Centos 5.1 (2.6.18-53.1.6.el5xen).
>>>>>>>>
>>>>>>>> 1) Each of the 5 servers is running the Centos 5.1 open-iscsi  
>>>>>>>> initiator, and sees the volumes exposed by the array just  
>>>>>>>> fine.  So far so good.
>>>>>>>>
>>>>>>>> 2) Created a volume group using the exposed iscsi volumes and  
>>>>>>>> created a few LVM2 logical volumes.
>>>>>>>>
>>>>>>>> 3) vgscan; vgchange -a y; on all the cluster members.  all  
>>>>>>>> see the "md3000vg" volume group.  looking good. (we have no  
>>>>>>>> intention of changing the LVM2 configurations much if at all,  
>>>>>>>> and can make sure all such changes are done when the volumes  
>>>>>>>> are off-line on all cluster members, so theoretically this  
>>>>>>>> should not be a problem).
>>>>>>>>
>>>>>>>> 4) mkfs.ocfs2 /dev/md3000vg/testvol0 -- works great
>>>>>>>>
>>>>>>>> 5) mount on all Xen dom0 boxes in the cluster, works great.
>>>>>>>>
>>>>>>>> 6) create a VM on one of the cluster members, set up iscsi,  
>>>>>>>> vgscan, md3000vg shows up -- looking good.
>>>>>>>>
>>>>>>>> 7) install ocfs2, 'service o2cb enable', starts up fine.   
>>>>>>>> mount /dev/md3000vg/testvol0, works fine.
>>>>>>>>
>>>>>>>> ** Thanks for making it this far -- this is where is gets  
>>>>>>>> interesting
>>>>>>>>
>>>>>>>> 8) run 'iozone' in domU against ocfs2 share - BANG -  
>>>>>>>> immediate kernel panic, repeatable all day long.
>>>>>>>>
>>>>>>>> "kernel BUG at fs/inode.c"
>>>>>>>>
>>>>>>>> So my questions:
>>>>>>>>
>>>>>>>> 1) should this work?
>>>>>>>>
>>>>>>>> 2) if not, what should we do differently?
>>>>>>>>
>>>>>>>> 3) currently we're tracking the latest RHEL/Centos 5.1  
>>>>>>>> kernels -- would we have better luck using the latest  
>>>>>>>> mainline kernel?
>>>>>>>>
>>>>>>>> Thanks for any assistance.
>>>>>>>>
>>>>>>>> Alok Dhir
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Ocfs2-users mailing list
>>>>>>>> Ocfs2-users at oss.oracle.com
>>>>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Ocfs2-users mailing list
>>>>>> Ocfs2-users at oss.oracle.com
>>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>
>>>>
>>>
>>
>




More information about the Ocfs2-users mailing list