[Ocfs2-users] Odd error on FC12 with ocfs2

David Murphy david at icewatermedia.com
Mon Mar 29 15:01:46 PDT 2010


Some additional data:
>From Web1 ( New Fedora Machine) to Web2:
	[root at web1 /etc/sysconfig/network-scripts]# nmap 192.168.102.141

	Starting Nmap 5.21 ( http://nmap.org ) at 2010-03-29 16:56 CDT
	Nmap scan report for 192.168.102.141
	Host is up (0.000076s latency).
	Not shown: 993 closed ports
	PORT     STATE SERVICE
	22/tcp   open  ssh
	80/tcp   open  http
	81/tcp   open  hosts2-ns
	111/tcp  open  rpcbind
	5666/tcp open  nrpe
	7777/tcp open  unknown
	9102/tcp open  jetdirect
	MAC Address: 00:50:56:A3:58:5D (VMware)
	
	Nmap done: 1 IP address (1 host up) scanned in 1.18 seconds


>From   web2 -> web1 (new fedora machine)
	[root at web2 ~]# nmap 192.168.102.140
	
	Starting Nmap 5.00 ( http://nmap.org ) at 2010-03-29 16:40 CDT
	Interesting ports on 192.168.102.140:
	Not shown: 994 closed ports
	PORT     STATE SERVICE
	22/tcp   open  ssh
	80/tcp   open  http
	81/tcp   open  hosts2-ns
	111/tcp  open  rpcbind
	443/tcp  open  https
	7777/tcp open  unknown
	MAC Address: 00:50:56:A3:14:62 (VMWare)

	Nmap done: 1 IP address (1 host up) scanned in 1.31 seconds


Cluster.conf:
	cluster:
		node_count = 6
		name = appshare
	
	node:
		ip_port = 7777
		ip_address = 192.168.102.140
		number = 1
		name = web1
		cluster = appshare
	
	node:
		ip_port = 7777
		ip_address = 192.168.102.141
		number = 2
		name = web2
		cluster = appshare
	
	node:
		ip_port = 7777
		ip_address = 192.168.102.142
		number = 3
		name = web3
		cluster = appshare
	
	node:
		ip_port = 7777
		ip_address = 192.168.102.111
		number = 4
		name = rgapp1
		cluster = appshare
	
	node:
		ip_port = 7777
		ip_address = 192.168.102.122
		number = 5
		name = deploy
		cluster = appshare
	
	node:
		ip_port = 7777
		ip_address = 192.168.102.112
		number = 6
		name = app1
		cluster = appshare

DMESG on WEB1:
	OCFS2 1.5.0
	(1199,0):o2net_connect_expired:1656 ERROR: no connection established
with node 2 after 30.0 seconds, giving up and returning errors.
	(1199,0):o2net_connect_expired:1656 ERROR: no connection established
with node 3 after 30.0 seconds, giving up and returning errors.
	(1199,0):o2net_connect_expired:1656 ERROR: no connection established
with node 4 after 30.0 seconds, giving up and returning errors.
	(1199,0):o2net_connect_expired:1656 ERROR: no connection established
with node 5 after 30.0 seconds, giving up and returning errors.
	(1199,0):o2net_connect_expired:1656 ERROR: no connection established
with node 6 after 30.0 seconds, giving up and returning errors.
	(1262,0):dlm_request_join:1035 ERROR: status = -107
	(1262,0):dlm_try_to_join_domain:1209 ERROR: status = -107
	(1262,0):dlm_join_domain:1487 ERROR: status = -107
	(1262,0):dlm_register_domain:1753 ERROR: status = -107
	(1262,0):o2cb_cluster_connect:313 ERROR: status = -107
	(1262,0):ocfs2_dlm_init:2963 ERROR: status = -107
	(1262,0):ocfs2_mount_volume:1788 ERROR: status = -107
	ocfs2: Unmounting device (253,1) on (node 0)
	(1199,0):o2net_connect_expired:1656 ERROR: no connection established
with node 2 after 30.0 seconds, giving up and returning errors.
	(1199,0):o2net_connect_expired:1656 ERROR: no connection established
with node 3 after 30.0 seconds, giving up and returning errors.
	(1199,0):o2net_connect_expired:1656 ERROR: no connection established
with node 5 after 30.0 seconds, giving up and returning errors.
	(1199,0):o2net_connect_expired:1656 ERROR: no connection established
with node 6 after 30.0 seconds, giving up and returning errors.
	(1323,0):dlm_request_join:1035 ERROR: status = -107
	(1323,0):dlm_try_to_join_domain:1209 ERROR: status = -107
	(1323,0):dlm_join_domain:1487 ERROR: status = -107
	(1323,0):dlm_register_domain:1753 ERROR: status = -107
	(1323,0):o2cb_cluster_connect:313 ERROR: status = -107
	(1323,0):ocfs2_dlm_init:2963 ERROR: status = -107
	(1323,0):ocfs2_mount_volume:1788 ERROR: status = -107
	ocfs2: Unmounting device (253,1) on (node 0)
	VMCI: Major device number is: 249
	VMware memory control driver initialized
	vmmemctl: started kernel thread pid=1522
	ocfs2: Unregistered cluster interface o2cb
	OCFS2 Node Manager 1.5.0
	OCFS2 DLM 1.5.0
	ocfs2: Registered cluster interface o2cb
	OCFS2 DLMFS 1.5.0
	OCFS2 User DLM kernel interface loaded
	OCFS2 1.5.0
	(1810,0):o2net_connect_expired:1656 ERROR: no connection established
with node 4 after 30.0 seconds, giving up and returning errors.
	(1810,0):o2net_connect_expired:1656 ERROR: no connection established
with node 5 after 30.0 seconds, giving up and returning errors.
	(1810,0):o2net_connect_expired:1656 ERROR: no connection established
with node 6 after 30.0 seconds, giving up and returning errors.
	(1810,0):o2net_connect_expired:1656 ERROR: no connection established
with node 2 after 30.0 seconds, giving up and returning errors.
	(1810,0):o2net_connect_expired:1656 ERROR: no connection established
with node 3 after 30.0 seconds, giving up and returning errors.
	(1839,0):dlm_request_join:1035 ERROR: status = -107
	(1839,0):dlm_try_to_join_domain:1209 ERROR: status = -107
	(1839,0):dlm_join_domain:1487 ERROR: status = -107
	(1839,0):dlm_register_domain:1753 ERROR: status = -107
	(1839,0):o2cb_cluster_connect:313 ERROR: status = -107
	(1839,0):ocfs2_dlm_init:2963 ERROR: status = -107
	(1839,0):ocfs2_mount_volume:1788 ERROR: status = -107
	ocfs2: Unmounting device (253,1) on (node 0)
	


So clearly  ocfs2 the service things it can connect to the node, but nmap
sees the connection just fine. And Web2 can see the port on web1 just fine,
so there is no firewall blocking the connections.

I think it might be   Fedora 12 used 1.50 for the OCFS kernel module and
CentOS 5.3/5.4 use 1.4.4-1. Am I correct in thinking this?

David
-----Original Message-----
From: Sunil Mushran [mailto:sunil.mushran at oracle.com]
Sent: Thursday, March 25, 2010 6:46 PM
To: David Murphy
Cc: ocfs2-users at oss.oracle.com
Subject: Re: [Ocfs2-users] Odd error on FC12 with ocfs2

hmm.. o2cb_ctl makes no connections. It just reads the cluster.conf and
populates configfs. AFAIK.

David Murphy wrote:
>
> We had  6 nodes running CentOS 5.4 using  1.4.3 ocfs2-tools.
>
>  
>
> I decided to rebuild one node with FC12.
>
>  
>
>  
>
> Which is working fine, however
>
>  
>
> Nmap 192.168.200.112  shows 7777 as open
>
> And
>
>  
>
> O2cb_ctl is  timing out when trying to connect to that node which then 
> causes a 107 error. This happens with all node and all node have 7777 
> open  via nmap from the FC machine.
>
>  
>
>  
>
> Is there a way to further debug this to see what exactly  o2cb_ctl is 
> seeing when trying to connect?
>
>  
>
>  
>
> David
>
> ----------------------------------------------------------------------
> --
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users





More information about the Ocfs2-users mailing list