[Ocfs2-users] sanity check - Xen+iSCSI+LVM+OCFS2 at dom0/domU

Thu Feb 7 10:40:47 PST 2008

Is the ip address correct? If not, correct.

# netstat -tan
See if that port is already in use. If  so, use another.

Alok Dhir wrote:
> Ah - thanks for the clarification.
>
> I'm left with one perplexing problem - on one of the hosts, 'devxen0', 
> o2cb refuses to start.  The box is identically configured to at least 
> 2 other cluster hosts and all were imaged the exact same way, except 
> that devxen0 has 32GB RAM where the others have 16 or less.
>
> Any clues where to look?
>
> -- 
> [root at devxen0:~] service o2cb enable
> Writing O2CB configuration: OK
> Starting O2CB cluster ocfs2: Failed
> Cluster ocfs2 created
> Node beast added
> o2cb_ctl: Internal logic failure while adding node devxen0
>
> Stopping O2CB cluster ocfs2: OK
> -- 
>
> This is in syslog when this happens:
>
> Feb  7 13:26:50 devxen0 kernel: 
> (17194,6):o2net_open_listening_sock:1867 ERROR: unable to bind socket 
> at 196.168.1.72:7777, ret=-99
>
> -- 
>
> Box config:
>
> [root at devxen0:~] uname -a
> Linux devxen0.symplicity.com 2.6.18-53.1.6.el5xen #1 SMP Wed Jan 23 
> 11:59:21 EST 2008 x86_64 x86_64 x86_64 GNU/Linux
>
> -- 
>
> Here is cluster.conf:
>
> ---
> node:
>     ip_port = 7777
>     ip_address = 192.168.1.62
>     number = 0
>     name = beast
>     cluster = ocfs2
>
> node:
>     ip_port = 7777
>     ip_address = 196.168.1.72
>     number = 1
>     name = devxen0
>     cluster = ocfs2
>
> node:
>     ip_port = 7777
>     ip_address = 192.168.1.73
>     number = 2
>     name = devxen1
>     cluster = ocfs2
>
> node:
>     ip_port = 7777
>     ip_address = 192.168.1.74
>     number = 3
>     name = devxen2
>     cluster = ocfs2
>
> node:
>     ip_port = 7777
>     ip_address = 192.168.1.70
>     number = 4
>     name = fs1
>     cluster = ocfs2
>
> node:
>     ip_port = 7777
>     ip_address = 192.168.1.71
>     number = 5
>     name = fs2
>     cluster = ocfs2
>
> node:
>     ip_port = 7777
>     ip_address = 192.168.1.80
>     number = 6
>     name = vdb1
>     cluster = ocfs2
>
> cluster:
>     node_count = 7
>     name = ocfs2
> ---
>
>
>
> On Feb 7, 2008, at 1:23 PM, Sunil Mushran wrote:
>
>> Yes, but backported and released as ocfs2 1.4 which is yet to be 
>> released.
>> You are on ocfs2 1.2.
>>
>> Alok Dhir wrote:
>>> I've seen that -- I was under the impression that some of those were 
>>> being backported into the release kernels.
>>>
>>> Thanks,
>>>
>>> Alok
>>>
>>> On Feb 7, 2008, at 1:15 PM, Sunil Mushran wrote:
>>>
>>>> http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2-new-features.html 
>>>>
>>>>
>>>> Alok Dhir wrote:
>>>>> We were indeed using a self-built module due to the lack of an OSS 
>>>>> one for the latest kernel.  Thanks for your response, I will test 
>>>>> with the new version.
>>>>>
>>>>> What are we leaving on the table by not using the latest mainline 
>>>>> kernel?
>>>>>
>>>>> On Feb 7, 2008, at 12:56 PM, Sunil Mushran wrote:
>>>>>
>>>>>> Are you building ocfs2 with this kernel or are using the ones we
>>>>>> provide for RHEL5?
>>>>>>
>>>>>> I am assuming you have built it yourself as we did not release
>>>>>> packages for the latest 2.6.18-53.1.6 kernel till last night.
>>>>>>
>>>>>> If you are using your own, then use the one from oss.
>>>>>>
>>>>>> If you are using the one from oss, then file a bugzilla with the
>>>>>> full oops trace.
>>>>>>
>>>>>> Thanks
>>>>>> Sunil
>>>>>>
>>>>>> Alok K. Dhir wrote:
>>>>>>> Hello all - we're evaluating OCFS2 in our development 
>>>>>>> environment to see if it meets our needs.
>>>>>>>
>>>>>>> We're testing it with an iSCSI storage array (Dell MD3000i) and 
>>>>>>> 5 servers running Centos 5.1 (2.6.18-53.1.6.el5xen).
>>>>>>>
>>>>>>> 1) Each of the 5 servers is running the Centos 5.1 open-iscsi 
>>>>>>> initiator, and sees the volumes exposed by the array just fine.  
>>>>>>> So far so good.
>>>>>>>
>>>>>>> 2) Created a volume group using the exposed iscsi volumes and 
>>>>>>> created a few LVM2 logical volumes.
>>>>>>>
>>>>>>> 3) vgscan; vgchange -a y; on all the cluster members.  all see 
>>>>>>> the "md3000vg" volume group.  looking good. (we have no 
>>>>>>> intention of changing the LVM2 configurations much if at all, 
>>>>>>> and can make sure all such changes are done when the volumes are 
>>>>>>> off-line on all cluster members, so theoretically this should 
>>>>>>> not be a problem).
>>>>>>>
>>>>>>> 4) mkfs.ocfs2 /dev/md3000vg/testvol0 -- works great
>>>>>>>
>>>>>>> 5) mount on all Xen dom0 boxes in the cluster, works great.
>>>>>>>
>>>>>>> 6) create a VM on one of the cluster members, set up iscsi, 
>>>>>>> vgscan, md3000vg shows up -- looking good.
>>>>>>>
>>>>>>> 7) install ocfs2, 'service o2cb enable', starts up fine.  mount 
>>>>>>> /dev/md3000vg/testvol0, works fine.
>>>>>>>
>>>>>>> ** Thanks for making it this far -- this is where is gets 
>>>>>>> interesting
>>>>>>>
>>>>>>> 8) run 'iozone' in domU against ocfs2 share - BANG - immediate 
>>>>>>> kernel panic, repeatable all day long.
>>>>>>>
>>>>>>>  "kernel BUG at fs/inode.c"
>>>>>>>
>>>>>>> So my questions:
>>>>>>>
>>>>>>> 1) should this work?
>>>>>>>
>>>>>>> 2) if not, what should we do differently?
>>>>>>>
>>>>>>> 3) currently we're tracking the latest RHEL/Centos 5.1 kernels 
>>>>>>> -- would we have better luck using the latest mainline kernel?
>>>>>>>
>>>>>>> Thanks for any assistance.
>>>>>>>
>>>>>>> Alok Dhir
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Ocfs2-users mailing list
>>>>>>> Ocfs2-users at oss.oracle.com
>>>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Ocfs2-users mailing list
>>>>> Ocfs2-users at oss.oracle.com
>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>
>>>
>>
>