[Ocfs2-users] Odd error on FC12 with ocfs2
David Murphy
david at icewatermedia.com
Mon Mar 29 15:11:56 PDT 2010
[root at web2 ~]# nc -z 192.168.102.140 7777
Connection to 192.168.102.140 7777 port [tcp/cbt] succeeded!
[root at web1 /etc/sysconfig/network-scripts]# nc -z 192.168.102.141 7777
Connection to 192.168.102.141 7777 port [tcp/cbt] succeeded!
-----Original Message-----
From: Sunil Mushran [mailto:sunil.mushran at oracle.com]
Sent: Monday, March 29, 2010 5:08 PM
To: David Murphy
Cc: ocfs2-users at oss.oracle.com
Subject: Re: [Ocfs2-users] Odd error on FC12 with ocfs2
What happens when you use netcat to ping the node?
nc -z host.example.com 7777
David Murphy wrote:
> Some additional data:
> From Web1 ( New Fedora Machine) to Web2:
> [root at web1 /etc/sysconfig/network-scripts]# nmap 192.168.102.141
>
> Starting Nmap 5.21 ( http://nmap.org ) at 2010-03-29 16:56 CDT
> Nmap scan report for 192.168.102.141
> Host is up (0.000076s latency).
> Not shown: 993 closed ports
> PORT STATE SERVICE
> 22/tcp open ssh
> 80/tcp open http
> 81/tcp open hosts2-ns
> 111/tcp open rpcbind
> 5666/tcp open nrpe
> 7777/tcp open unknown
> 9102/tcp open jetdirect
> MAC Address: 00:50:56:A3:58:5D (VMware)
>
> Nmap done: 1 IP address (1 host up) scanned in 1.18 seconds
>
>
> From web2 -> web1 (new fedora machine)
> [root at web2 ~]# nmap 192.168.102.140
>
> Starting Nmap 5.00 ( http://nmap.org ) at 2010-03-29 16:40 CDT
> Interesting ports on 192.168.102.140:
> Not shown: 994 closed ports
> PORT STATE SERVICE
> 22/tcp open ssh
> 80/tcp open http
> 81/tcp open hosts2-ns
> 111/tcp open rpcbind
> 443/tcp open https
> 7777/tcp open unknown
> MAC Address: 00:50:56:A3:14:62 (VMWare)
>
> Nmap done: 1 IP address (1 host up) scanned in 1.31 seconds
>
>
> Cluster.conf:
> cluster:
> node_count = 6
> name = appshare
>
> node:
> ip_port = 7777
> ip_address = 192.168.102.140
> number = 1
> name = web1
> cluster = appshare
>
> node:
> ip_port = 7777
> ip_address = 192.168.102.141
> number = 2
> name = web2
> cluster = appshare
>
> node:
> ip_port = 7777
> ip_address = 192.168.102.142
> number = 3
> name = web3
> cluster = appshare
>
> node:
> ip_port = 7777
> ip_address = 192.168.102.111
> number = 4
> name = rgapp1
> cluster = appshare
>
> node:
> ip_port = 7777
> ip_address = 192.168.102.122
> number = 5
> name = deploy
> cluster = appshare
>
> node:
> ip_port = 7777
> ip_address = 192.168.102.112
> number = 6
> name = app1
> cluster = appshare
>
> DMESG on WEB1:
> OCFS2 1.5.0
> (1199,0):o2net_connect_expired:1656 ERROR: no connection established
> with node 2 after 30.0 seconds, giving up and returning errors.
> (1199,0):o2net_connect_expired:1656 ERROR: no connection established
> with node 3 after 30.0 seconds, giving up and returning errors.
> (1199,0):o2net_connect_expired:1656 ERROR: no connection established
> with node 4 after 30.0 seconds, giving up and returning errors.
> (1199,0):o2net_connect_expired:1656 ERROR: no connection established
> with node 5 after 30.0 seconds, giving up and returning errors.
> (1199,0):o2net_connect_expired:1656 ERROR: no connection established
> with node 6 after 30.0 seconds, giving up and returning errors.
> (1262,0):dlm_request_join:1035 ERROR: status = -107
> (1262,0):dlm_try_to_join_domain:1209 ERROR: status = -107
> (1262,0):dlm_join_domain:1487 ERROR: status = -107
> (1262,0):dlm_register_domain:1753 ERROR: status = -107
> (1262,0):o2cb_cluster_connect:313 ERROR: status = -107
> (1262,0):ocfs2_dlm_init:2963 ERROR: status = -107
> (1262,0):ocfs2_mount_volume:1788 ERROR: status = -107
> ocfs2: Unmounting device (253,1) on (node 0)
> (1199,0):o2net_connect_expired:1656 ERROR: no connection established
> with node 2 after 30.0 seconds, giving up and returning errors.
> (1199,0):o2net_connect_expired:1656 ERROR: no connection established
> with node 3 after 30.0 seconds, giving up and returning errors.
> (1199,0):o2net_connect_expired:1656 ERROR: no connection established
> with node 5 after 30.0 seconds, giving up and returning errors.
> (1199,0):o2net_connect_expired:1656 ERROR: no connection established
> with node 6 after 30.0 seconds, giving up and returning errors.
> (1323,0):dlm_request_join:1035 ERROR: status = -107
> (1323,0):dlm_try_to_join_domain:1209 ERROR: status = -107
> (1323,0):dlm_join_domain:1487 ERROR: status = -107
> (1323,0):dlm_register_domain:1753 ERROR: status = -107
> (1323,0):o2cb_cluster_connect:313 ERROR: status = -107
> (1323,0):ocfs2_dlm_init:2963 ERROR: status = -107
> (1323,0):ocfs2_mount_volume:1788 ERROR: status = -107
> ocfs2: Unmounting device (253,1) on (node 0)
> VMCI: Major device number is: 249
> VMware memory control driver initialized
> vmmemctl: started kernel thread pid=1522
> ocfs2: Unregistered cluster interface o2cb
> OCFS2 Node Manager 1.5.0
> OCFS2 DLM 1.5.0
> ocfs2: Registered cluster interface o2cb
> OCFS2 DLMFS 1.5.0
> OCFS2 User DLM kernel interface loaded
> OCFS2 1.5.0
> (1810,0):o2net_connect_expired:1656 ERROR: no connection established
> with node 4 after 30.0 seconds, giving up and returning errors.
> (1810,0):o2net_connect_expired:1656 ERROR: no connection established
> with node 5 after 30.0 seconds, giving up and returning errors.
> (1810,0):o2net_connect_expired:1656 ERROR: no connection established
> with node 6 after 30.0 seconds, giving up and returning errors.
> (1810,0):o2net_connect_expired:1656 ERROR: no connection established
> with node 2 after 30.0 seconds, giving up and returning errors.
> (1810,0):o2net_connect_expired:1656 ERROR: no connection established
> with node 3 after 30.0 seconds, giving up and returning errors.
> (1839,0):dlm_request_join:1035 ERROR: status = -107
> (1839,0):dlm_try_to_join_domain:1209 ERROR: status = -107
> (1839,0):dlm_join_domain:1487 ERROR: status = -107
> (1839,0):dlm_register_domain:1753 ERROR: status = -107
> (1839,0):o2cb_cluster_connect:313 ERROR: status = -107
> (1839,0):ocfs2_dlm_init:2963 ERROR: status = -107
> (1839,0):ocfs2_mount_volume:1788 ERROR: status = -107
> ocfs2: Unmounting device (253,1) on (node 0)
>
>
>
> So clearly ocfs2 the service things it can connect to the node, but nmap
> sees the connection just fine. And Web2 can see the port on web1 just
fine,
> so there is no firewall blocking the connections.
>
> I think it might be Fedora 12 used 1.50 for the OCFS kernel module and
> CentOS 5.3/5.4 use 1.4.4-1. Am I correct in thinking this?
>
> David
> -----Original Message-----
> From: Sunil Mushran [mailto:sunil.mushran at oracle.com]
> Sent: Thursday, March 25, 2010 6:46 PM
> To: David Murphy
> Cc: ocfs2-users at oss.oracle.com
> Subject: Re: [Ocfs2-users] Odd error on FC12 with ocfs2
>
> hmm.. o2cb_ctl makes no connections. It just reads the cluster.conf and
> populates configfs. AFAIK.
>
> David Murphy wrote:
>
>> We had 6 nodes running CentOS 5.4 using 1.4.3 ocfs2-tools.
>>
>>
>>
>> I decided to rebuild one node with FC12.
>>
>>
>>
>>
>>
>> Which is working fine, however
>>
>>
>>
>> Nmap 192.168.200.112 shows 7777 as open
>>
>> And
>>
>>
>>
>> O2cb_ctl is timing out when trying to connect to that node which then
>> causes a 107 error. This happens with all node and all node have 7777
>> open via nmap from the FC machine.
>>
>>
>>
>>
>>
>> Is there a way to further debug this to see what exactly o2cb_ctl is
>> seeing when trying to connect?
>>
>>
>>
>>
>>
>> David
>>
>> ----------------------------------------------------------------------
>> --
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
>
>
>
More information about the Ocfs2-users
mailing list