[Ocfs2-users] Odd error on FC12 with ocfs2

Sunil Mushran sunil.mushran at oracle.com
Mon Mar 29 15:25:27 PDT 2010


Enable some debugging.

#debugfs.ocfs2 -l TCP allow
...do mount...
#debugfs.ocfs2 -l TCP off


David Murphy wrote:
> [root at web2 ~]# nc -z  192.168.102.140 7777
> Connection to 192.168.102.140 7777 port [tcp/cbt] succeeded!
>
> [root at web1 /etc/sysconfig/network-scripts]# nc -z  192.168.102.141 7777
> Connection to 192.168.102.141 7777 port [tcp/cbt] succeeded!
>
> -----Original Message-----
> From: Sunil Mushran [mailto:sunil.mushran at oracle.com] 
> Sent: Monday, March 29, 2010 5:08 PM
> To: David Murphy
> Cc: ocfs2-users at oss.oracle.com
> Subject: Re: [Ocfs2-users] Odd error on FC12 with ocfs2
>
> What happens when you use netcat to ping the node?
> nc -z host.example.com 7777
>
> David Murphy wrote:
>   
>> Some additional data:
>> From Web1 ( New Fedora Machine) to Web2:
>> 	[root at web1 /etc/sysconfig/network-scripts]# nmap 192.168.102.141
>>
>> 	Starting Nmap 5.21 ( http://nmap.org ) at 2010-03-29 16:56 CDT
>> 	Nmap scan report for 192.168.102.141
>> 	Host is up (0.000076s latency).
>> 	Not shown: 993 closed ports
>> 	PORT     STATE SERVICE
>> 	22/tcp   open  ssh
>> 	80/tcp   open  http
>> 	81/tcp   open  hosts2-ns
>> 	111/tcp  open  rpcbind
>> 	5666/tcp open  nrpe
>> 	7777/tcp open  unknown
>> 	9102/tcp open  jetdirect
>> 	MAC Address: 00:50:56:A3:58:5D (VMware)
>> 	
>> 	Nmap done: 1 IP address (1 host up) scanned in 1.18 seconds
>>
>>
>> From   web2 -> web1 (new fedora machine)
>> 	[root at web2 ~]# nmap 192.168.102.140
>> 	
>> 	Starting Nmap 5.00 ( http://nmap.org ) at 2010-03-29 16:40 CDT
>> 	Interesting ports on 192.168.102.140:
>> 	Not shown: 994 closed ports
>> 	PORT     STATE SERVICE
>> 	22/tcp   open  ssh
>> 	80/tcp   open  http
>> 	81/tcp   open  hosts2-ns
>> 	111/tcp  open  rpcbind
>> 	443/tcp  open  https
>> 	7777/tcp open  unknown
>> 	MAC Address: 00:50:56:A3:14:62 (VMWare)
>>
>> 	Nmap done: 1 IP address (1 host up) scanned in 1.31 seconds
>>
>>
>> Cluster.conf:
>> 	cluster:
>> 		node_count = 6
>> 		name = appshare
>> 	
>> 	node:
>> 		ip_port = 7777
>> 		ip_address = 192.168.102.140
>> 		number = 1
>> 		name = web1
>> 		cluster = appshare
>> 	
>> 	node:
>> 		ip_port = 7777
>> 		ip_address = 192.168.102.141
>> 		number = 2
>> 		name = web2
>> 		cluster = appshare
>> 	
>> 	node:
>> 		ip_port = 7777
>> 		ip_address = 192.168.102.142
>> 		number = 3
>> 		name = web3
>> 		cluster = appshare
>> 	
>> 	node:
>> 		ip_port = 7777
>> 		ip_address = 192.168.102.111
>> 		number = 4
>> 		name = rgapp1
>> 		cluster = appshare
>> 	
>> 	node:
>> 		ip_port = 7777
>> 		ip_address = 192.168.102.122
>> 		number = 5
>> 		name = deploy
>> 		cluster = appshare
>> 	
>> 	node:
>> 		ip_port = 7777
>> 		ip_address = 192.168.102.112
>> 		number = 6
>> 		name = app1
>> 		cluster = appshare
>>
>> DMESG on WEB1:
>> 	OCFS2 1.5.0
>> 	(1199,0):o2net_connect_expired:1656 ERROR: no connection established
>> with node 2 after 30.0 seconds, giving up and returning errors.
>> 	(1199,0):o2net_connect_expired:1656 ERROR: no connection established
>> with node 3 after 30.0 seconds, giving up and returning errors.
>> 	(1199,0):o2net_connect_expired:1656 ERROR: no connection established
>> with node 4 after 30.0 seconds, giving up and returning errors.
>> 	(1199,0):o2net_connect_expired:1656 ERROR: no connection established
>> with node 5 after 30.0 seconds, giving up and returning errors.
>> 	(1199,0):o2net_connect_expired:1656 ERROR: no connection established
>> with node 6 after 30.0 seconds, giving up and returning errors.
>> 	(1262,0):dlm_request_join:1035 ERROR: status = -107
>> 	(1262,0):dlm_try_to_join_domain:1209 ERROR: status = -107
>> 	(1262,0):dlm_join_domain:1487 ERROR: status = -107
>> 	(1262,0):dlm_register_domain:1753 ERROR: status = -107
>> 	(1262,0):o2cb_cluster_connect:313 ERROR: status = -107
>> 	(1262,0):ocfs2_dlm_init:2963 ERROR: status = -107
>> 	(1262,0):ocfs2_mount_volume:1788 ERROR: status = -107
>> 	ocfs2: Unmounting device (253,1) on (node 0)
>> 	(1199,0):o2net_connect_expired:1656 ERROR: no connection established
>> with node 2 after 30.0 seconds, giving up and returning errors.
>> 	(1199,0):o2net_connect_expired:1656 ERROR: no connection established
>> with node 3 after 30.0 seconds, giving up and returning errors.
>> 	(1199,0):o2net_connect_expired:1656 ERROR: no connection established
>> with node 5 after 30.0 seconds, giving up and returning errors.
>> 	(1199,0):o2net_connect_expired:1656 ERROR: no connection established
>> with node 6 after 30.0 seconds, giving up and returning errors.
>> 	(1323,0):dlm_request_join:1035 ERROR: status = -107
>> 	(1323,0):dlm_try_to_join_domain:1209 ERROR: status = -107
>> 	(1323,0):dlm_join_domain:1487 ERROR: status = -107
>> 	(1323,0):dlm_register_domain:1753 ERROR: status = -107
>> 	(1323,0):o2cb_cluster_connect:313 ERROR: status = -107
>> 	(1323,0):ocfs2_dlm_init:2963 ERROR: status = -107
>> 	(1323,0):ocfs2_mount_volume:1788 ERROR: status = -107
>> 	ocfs2: Unmounting device (253,1) on (node 0)
>> 	VMCI: Major device number is: 249
>> 	VMware memory control driver initialized
>> 	vmmemctl: started kernel thread pid=1522
>> 	ocfs2: Unregistered cluster interface o2cb
>> 	OCFS2 Node Manager 1.5.0
>> 	OCFS2 DLM 1.5.0
>> 	ocfs2: Registered cluster interface o2cb
>> 	OCFS2 DLMFS 1.5.0
>> 	OCFS2 User DLM kernel interface loaded
>> 	OCFS2 1.5.0
>> 	(1810,0):o2net_connect_expired:1656 ERROR: no connection established
>> with node 4 after 30.0 seconds, giving up and returning errors.
>> 	(1810,0):o2net_connect_expired:1656 ERROR: no connection established
>> with node 5 after 30.0 seconds, giving up and returning errors.
>> 	(1810,0):o2net_connect_expired:1656 ERROR: no connection established
>> with node 6 after 30.0 seconds, giving up and returning errors.
>> 	(1810,0):o2net_connect_expired:1656 ERROR: no connection established
>> with node 2 after 30.0 seconds, giving up and returning errors.
>> 	(1810,0):o2net_connect_expired:1656 ERROR: no connection established
>> with node 3 after 30.0 seconds, giving up and returning errors.
>> 	(1839,0):dlm_request_join:1035 ERROR: status = -107
>> 	(1839,0):dlm_try_to_join_domain:1209 ERROR: status = -107
>> 	(1839,0):dlm_join_domain:1487 ERROR: status = -107
>> 	(1839,0):dlm_register_domain:1753 ERROR: status = -107
>> 	(1839,0):o2cb_cluster_connect:313 ERROR: status = -107
>> 	(1839,0):ocfs2_dlm_init:2963 ERROR: status = -107
>> 	(1839,0):ocfs2_mount_volume:1788 ERROR: status = -107
>> 	ocfs2: Unmounting device (253,1) on (node 0)
>> 	
>>
>>
>> So clearly  ocfs2 the service things it can connect to the node, but nmap
>> sees the connection just fine. And Web2 can see the port on web1 just
>>     
> fine,
>   
>> so there is no firewall blocking the connections.
>>
>> I think it might be   Fedora 12 used 1.50 for the OCFS kernel module and
>> CentOS 5.3/5.4 use 1.4.4-1. Am I correct in thinking this?
>>
>> David
>> -----Original Message-----
>> From: Sunil Mushran [mailto:sunil.mushran at oracle.com]
>> Sent: Thursday, March 25, 2010 6:46 PM
>> To: David Murphy
>> Cc: ocfs2-users at oss.oracle.com
>> Subject: Re: [Ocfs2-users] Odd error on FC12 with ocfs2
>>
>> hmm.. o2cb_ctl makes no connections. It just reads the cluster.conf and
>> populates configfs. AFAIK.
>>
>> David Murphy wrote:
>>   
>>     
>>> We had  6 nodes running CentOS 5.4 using  1.4.3 ocfs2-tools.
>>>
>>>  
>>>
>>> I decided to rebuild one node with FC12.
>>>
>>>  
>>>
>>>  
>>>
>>> Which is working fine, however
>>>
>>>  
>>>
>>> Nmap 192.168.200.112  shows 7777 as open
>>>
>>> And
>>>
>>>  
>>>
>>> O2cb_ctl is  timing out when trying to connect to that node which then 
>>> causes a 107 error. This happens with all node and all node have 7777 
>>> open  via nmap from the FC machine.
>>>
>>>  
>>>
>>>  
>>>
>>> Is there a way to further debug this to see what exactly  o2cb_ctl is 
>>> seeing when trying to connect?
>>>
>>>  
>>>
>>>  
>>>
>>> David
>>>
>>> ----------------------------------------------------------------------
>>> --
>>>
>>> _______________________________________________
>>> Ocfs2-users mailing list
>>> Ocfs2-users at oss.oracle.com
>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>     
>>>       
>>   
>>     
>
>   




More information about the Ocfs2-users mailing list