[Ocfs2-users] sanity check - Xen+iSCSI+LVM+OCFS2 at dom0/domU

Thu Feb 7 15:56:24 PST 2008

Setup netconsole on the cluster members (domU?) to get a stack trace.

Alok Dhir wrote:
> Thanks again for your prompt assistance earlier today - we seem to 
> have gotten past the fs/inode.c bug at domU by using the OSS packaged 
> ocfs2 kernel modules.  The cluster comes up and mounts on all boxes, 
> and appears to work.
>
> However, we have now run into a more prevalent issue - at dom0, any of 
> the cluster member servers will spontaneously reboot when I start an 
> 'iozone -A' in an ocfs2 filesystem.  I am unable to check the kernel 
> panic message as the box reboots immediately, despite the setting of 
> 'kernel.panic=0' in sysctl (which is supposed to mean 'do not reboot 
> on panic').  There are also no entries in messages when this happens.
>
> I realize there's not much debugging you can do without the panic 
> message, but I'm wondering if perhaps this new version has some bug 
> which was not in 1.2.7 (with our self-built 1.2.7 only domU servers 
> rebooted - dom0 were stable).
>
> Are others running this new version with success?  Under RHEL/Centos 
> 5.1 Xen dom0/domU?
>
> On Feb 7, 2008, at 1:40 PM, Sunil Mushran wrote:
>
>> Is the ip address correct? If not, correct.
>>
>> # netstat -tan
>> See if that port is already in use. If  so, use another.
>>
>> Alok Dhir wrote:
>>> Ah - thanks for the clarification.
>>>
>>> I'm left with one perplexing problem - on one of the hosts, 
>>> 'devxen0', o2cb refuses to start.  The box is identically configured 
>>> to at least 2 other cluster hosts and all were imaged the exact same 
>>> way, except that devxen0 has 32GB RAM where the others have 16 or less.
>>>
>>> Any clues where to look?
>>>
>>> --[root at devxen0:~] service o2cb enable
>>> Writing O2CB configuration: OK
>>> Starting O2CB cluster ocfs2: Failed
>>> Cluster ocfs2 created
>>> Node beast added
>>> o2cb_ctl: Internal logic failure while adding node devxen0
>>>
>>> Stopping O2CB cluster ocfs2: OK
>>> -- 
>>> This is in syslog when this happens:
>>>
>>> Feb  7 13:26:50 devxen0 kernel: 
>>> (17194,6):o2net_open_listening_sock:1867 ERROR: unable to bind 
>>> socket at 196.168.1.72:7777, ret=-99
>>>
>>> -- 
>>> Box config:
>>>
>>> [root at devxen0:~] uname -a
>>> Linux devxen0.symplicity.com 2.6.18-53.1.6.el5xen #1 SMP Wed Jan 23 
>>> 11:59:21 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
>>>
>>> -- 
>>> Here is cluster.conf:
>>>
>>> ---
>>> node:
>>>    ip_port = 7777
>>>    ip_address = 192.168.1.62
>>>    number = 0
>>>    name = beast
>>>    cluster = ocfs2
>>>
>>> node:
>>>    ip_port = 7777
>>>    ip_address = 196.168.1.72
>>>    number = 1
>>>    name = devxen0
>>>    cluster = ocfs2
>>>
>>> node:
>>>    ip_port = 7777
>>>    ip_address = 192.168.1.73
>>>    number = 2
>>>    name = devxen1
>>>    cluster = ocfs2
>>>
>>> node:
>>>    ip_port = 7777
>>>    ip_address = 192.168.1.74
>>>    number = 3
>>>    name = devxen2
>>>    cluster = ocfs2
>>>
>>> node:
>>>    ip_port = 7777
>>>    ip_address = 192.168.1.70
>>>    number = 4
>>>    name = fs1
>>>    cluster = ocfs2
>>>
>>> node:
>>>    ip_port = 7777
>>>    ip_address = 192.168.1.71
>>>    number = 5
>>>    name = fs2
>>>    cluster = ocfs2
>>>
>>> node:
>>>    ip_port = 7777
>>>    ip_address = 192.168.1.80
>>>    number = 6
>>>    name = vdb1
>>>    cluster = ocfs2
>>>
>>> cluster:
>>>    node_count = 7
>>>    name = ocfs2
>>> ---
>>>
>>>
>>>
>>> On Feb 7, 2008, at 1:23 PM, Sunil Mushran wrote:
>>>
>>>> Yes, but backported and released as ocfs2 1.4 which is yet to be 
>>>> released.
>>>> You are on ocfs2 1.2.
>>>>
>>>> Alok Dhir wrote:
>>>>> I've seen that -- I was under the impression that some of those 
>>>>> were being backported into the release kernels.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Alok
>>>>>
>>>>> On Feb 7, 2008, at 1:15 PM, Sunil Mushran wrote:
>>>>>
>>>>>> http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2-new-features.html 
>>>>>>
>>>>>>
>>>>>> Alok Dhir wrote:
>>>>>>> We were indeed using a self-built module due to the lack of an 
>>>>>>> OSS one for the latest kernel.  Thanks for your response, I will 
>>>>>>> test with the new version.
>>>>>>>
>>>>>>> What are we leaving on the table by not using the latest 
>>>>>>> mainline kernel?
>>>>>>>
>>>>>>> On Feb 7, 2008, at 12:56 PM, Sunil Mushran wrote:
>>>>>>>
>>>>>>>> Are you building ocfs2 with this kernel or are using the ones we
>>>>>>>> provide for RHEL5?
>>>>>>>>
>>>>>>>> I am assuming you have built it yourself as we did not release
>>>>>>>> packages for the latest 2.6.18-53.1.6 kernel till last night.
>>>>>>>>
>>>>>>>> If you are using your own, then use the one from oss.
>>>>>>>>
>>>>>>>> If you are using the one from oss, then file a bugzilla with the
>>>>>>>> full oops trace.
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Sunil
>>>>>>>>
>>>>>>>> Alok K. Dhir wrote:
>>>>>>>>> Hello all - we're evaluating OCFS2 in our development 
>>>>>>>>> environment to see if it meets our needs.
>>>>>>>>>
>>>>>>>>> We're testing it with an iSCSI storage array (Dell MD3000i) 
>>>>>>>>> and 5 servers running Centos 5.1 (2.6.18-53.1.6.el5xen).
>>>>>>>>>
>>>>>>>>> 1) Each of the 5 servers is running the Centos 5.1 open-iscsi 
>>>>>>>>> initiator, and sees the volumes exposed by the array just 
>>>>>>>>> fine.  So far so good.
>>>>>>>>>
>>>>>>>>> 2) Created a volume group using the exposed iscsi volumes and 
>>>>>>>>> created a few LVM2 logical volumes.
>>>>>>>>>
>>>>>>>>> 3) vgscan; vgchange -a y; on all the cluster members.  all see 
>>>>>>>>> the "md3000vg" volume group.  looking good. (we have no 
>>>>>>>>> intention of changing the LVM2 configurations much if at all, 
>>>>>>>>> and can make sure all such changes are done when the volumes 
>>>>>>>>> are off-line on all cluster members, so theoretically this 
>>>>>>>>> should not be a problem).
>>>>>>>>>
>>>>>>>>> 4) mkfs.ocfs2 /dev/md3000vg/testvol0 -- works great
>>>>>>>>>
>>>>>>>>> 5) mount on all Xen dom0 boxes in the cluster, works great.
>>>>>>>>>
>>>>>>>>> 6) create a VM on one of the cluster members, set up iscsi, 
>>>>>>>>> vgscan, md3000vg shows up -- looking good.
>>>>>>>>>
>>>>>>>>> 7) install ocfs2, 'service o2cb enable', starts up fine.  
>>>>>>>>> mount /dev/md3000vg/testvol0, works fine.
>>>>>>>>>
>>>>>>>>> ** Thanks for making it this far -- this is where is gets 
>>>>>>>>> interesting
>>>>>>>>>
>>>>>>>>> 8) run 'iozone' in domU against ocfs2 share - BANG - immediate 
>>>>>>>>> kernel panic, repeatable all day long.
>>>>>>>>>
>>>>>>>>> "kernel BUG at fs/inode.c"
>>>>>>>>>
>>>>>>>>> So my questions:
>>>>>>>>>
>>>>>>>>> 1) should this work?
>>>>>>>>>
>>>>>>>>> 2) if not, what should we do differently?
>>>>>>>>>
>>>>>>>>> 3) currently we're tracking the latest RHEL/Centos 5.1 kernels 
>>>>>>>>> -- would we have better luck using the latest mainline kernel?
>>>>>>>>>
>>>>>>>>> Thanks for any assistance.
>>>>>>>>>
>>>>>>>>> Alok Dhir
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Ocfs2-users mailing list
>>>>>>>>> Ocfs2-users at oss.oracle.com
>>>>>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Ocfs2-users mailing list
>>>>>>> Ocfs2-users at oss.oracle.com
>>>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>>
>>>>>
>>>>
>>>
>>
>