[Ocfs2-users] Pb with ocfs2 & dlm on Fedora 13

Alain.Moulle Alain.Moulle at bull.net
Tue Nov 9 02:12:46 PST 2010


Hi Tao,

yes , on the three nodes theMax Node Slots is 8

echo 'stats'|debugfs.ocfs2 /dev/sdc1|grep Slots
debugfs.ocfs2 1.4.3
	Max Node Slots: 8

Regards,
Alain

Tao Ma a écrit :
> Hi Alain,
>
> On 11/09/2010 04:49 PM, Alain.Moulle wrote:
>   
>>   Hi,
>> The three cluster.conf are exactly the same on the 3 nodes.
>> The errors messages are :
>>
>> -nodes1:
>> 	o2net: accepted connection from node selfxl-5 (num 1) at
>> 10.197.189.218:7777
>> o2net: no longer connected to node selfxl-5 (num 1) at
>> 10.197.189.218:7777
>>
>> -nodes2:
>> 	(1457,1):o2net_connect_expired:1656 ERROR: no connection established
>> with node 1 after 30.0 seconds, giving up and returning errors.
>>
>> Note that once a mount is refused for example on node3, if
>> I umount the FS on node1 for example, then I can mount it
>> on node3.
>>     
> Oh, so do you have enough slots for all these 3 nodes to mount?
>
> What's the output for the below command?
> echo 'stats'|debugfs.ocfs2 /dev/sdx|grep Slots
>
> Regards,
> Tao
>   
>> Note also that when the mound is refused for example on node3,
>> I've check that this node3"pings"  successfully both other
>> nodes on IP addr given in cluster.conf.
>>
>> Alain
>>
>>
>>
>>
>> Tao Ma a écrit :
>>     
>>> Hi Alain,
>>>
>>> On 11/08/2010 11:08 PM, Alain.Moulle wrote:
>>>
>>>       
>>>>    Hi,
>>>>
>>>> I have a problem on Fedora13 with releases :
>>>> ocfs2  1.4.3-5.fc13.x86_64
>>>> dlm_tool 3.0.17
>>>>
>>>> With a 3 nodes ocfs2 cluster, I can't mount FS on the three nodes at the same time
>>>> but only on two nodes   among the 3 nodes  , whatever the two nodes are among the 3 nodes.
>>>>
>>>> The errors are :
>>>> "(1475,0):o2net_connect_expired:1656 ERROR: no connection established
>>>> with node 2 after 30.0 seconds, giving up and returning errors.
>>>> (2175,0):dlm_request_join:1035 ERROR: status = -107
>>>> (2175,0):dlm_try_to_join_domain:1209 ERROR: status = -107
>>>> (2175,0):dlm_join_domain:1487 ERROR: status = -107
>>>> (2175,0):dlm_register_domain:1753 ERROR: status = -107
>>>> (2175,0):o2cb_cluster_connect:313 ERROR: status = -107
>>>> (2175,0):ocfs2_dlm_init:2995 ERROR: status = -107
>>>> (2175,0):ocfs2_mount_volume:1789 ERROR: status = -107
>>>> ocfs2: Unmounting device (8,16) on (node 0)
>>>> o2net: no longer connected to node selfxl-4 (num 0) at
>>>> 10.197.189.204:7777
>>>> o2net: connected to node selfxl-4 (num 0) at 10.197.189.204:7777
>>>>
>>>> It seems to be a lock management problem
>>>> Is it an already known issue ?
>>>> Is there an available patch ?
>>>>
>>>>         
>>> It doesn't look like a dlm problem, but a network problem. ;)
>>> So your first error is o2net_connect_expired.
>>> So it seems that the 3rd node can't connect with node 2.
>>> Could you please check the error message in node 2?
>>>
>>> btw, I would deem that the cluster.conf is the same among the 3 nodes,
>>> and you you can connect to 7777(which is used by ocfs2) of node 2 from
>>> node 3.
>>>
>>> Regards,
>>> Tao
>>>
>>> _______________________________________________
>>> Ocfs2-users mailing list
>>> Ocfs2-users at oss.oracle.com  <mailto:Ocfs2-users at oss.oracle.com>
>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>
>>>
>>>
>>>       
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>
>
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20101109/b7a3e91d/attachment.html 


More information about the Ocfs2-users mailing list