[Ocfs2-users] Odd error on FC12 with ocfs2
David Murphy
david at icewatermedia.com
Mon Mar 29 15:01:46 PDT 2010
Some additional data:
>From Web1 ( New Fedora Machine) to Web2:
[root at web1 /etc/sysconfig/network-scripts]# nmap 192.168.102.141
Starting Nmap 5.21 ( http://nmap.org ) at 2010-03-29 16:56 CDT
Nmap scan report for 192.168.102.141
Host is up (0.000076s latency).
Not shown: 993 closed ports
PORT STATE SERVICE
22/tcp open ssh
80/tcp open http
81/tcp open hosts2-ns
111/tcp open rpcbind
5666/tcp open nrpe
7777/tcp open unknown
9102/tcp open jetdirect
MAC Address: 00:50:56:A3:58:5D (VMware)
Nmap done: 1 IP address (1 host up) scanned in 1.18 seconds
>From web2 -> web1 (new fedora machine)
[root at web2 ~]# nmap 192.168.102.140
Starting Nmap 5.00 ( http://nmap.org ) at 2010-03-29 16:40 CDT
Interesting ports on 192.168.102.140:
Not shown: 994 closed ports
PORT STATE SERVICE
22/tcp open ssh
80/tcp open http
81/tcp open hosts2-ns
111/tcp open rpcbind
443/tcp open https
7777/tcp open unknown
MAC Address: 00:50:56:A3:14:62 (VMWare)
Nmap done: 1 IP address (1 host up) scanned in 1.31 seconds
Cluster.conf:
cluster:
node_count = 6
name = appshare
node:
ip_port = 7777
ip_address = 192.168.102.140
number = 1
name = web1
cluster = appshare
node:
ip_port = 7777
ip_address = 192.168.102.141
number = 2
name = web2
cluster = appshare
node:
ip_port = 7777
ip_address = 192.168.102.142
number = 3
name = web3
cluster = appshare
node:
ip_port = 7777
ip_address = 192.168.102.111
number = 4
name = rgapp1
cluster = appshare
node:
ip_port = 7777
ip_address = 192.168.102.122
number = 5
name = deploy
cluster = appshare
node:
ip_port = 7777
ip_address = 192.168.102.112
number = 6
name = app1
cluster = appshare
DMESG on WEB1:
OCFS2 1.5.0
(1199,0):o2net_connect_expired:1656 ERROR: no connection established
with node 2 after 30.0 seconds, giving up and returning errors.
(1199,0):o2net_connect_expired:1656 ERROR: no connection established
with node 3 after 30.0 seconds, giving up and returning errors.
(1199,0):o2net_connect_expired:1656 ERROR: no connection established
with node 4 after 30.0 seconds, giving up and returning errors.
(1199,0):o2net_connect_expired:1656 ERROR: no connection established
with node 5 after 30.0 seconds, giving up and returning errors.
(1199,0):o2net_connect_expired:1656 ERROR: no connection established
with node 6 after 30.0 seconds, giving up and returning errors.
(1262,0):dlm_request_join:1035 ERROR: status = -107
(1262,0):dlm_try_to_join_domain:1209 ERROR: status = -107
(1262,0):dlm_join_domain:1487 ERROR: status = -107
(1262,0):dlm_register_domain:1753 ERROR: status = -107
(1262,0):o2cb_cluster_connect:313 ERROR: status = -107
(1262,0):ocfs2_dlm_init:2963 ERROR: status = -107
(1262,0):ocfs2_mount_volume:1788 ERROR: status = -107
ocfs2: Unmounting device (253,1) on (node 0)
(1199,0):o2net_connect_expired:1656 ERROR: no connection established
with node 2 after 30.0 seconds, giving up and returning errors.
(1199,0):o2net_connect_expired:1656 ERROR: no connection established
with node 3 after 30.0 seconds, giving up and returning errors.
(1199,0):o2net_connect_expired:1656 ERROR: no connection established
with node 5 after 30.0 seconds, giving up and returning errors.
(1199,0):o2net_connect_expired:1656 ERROR: no connection established
with node 6 after 30.0 seconds, giving up and returning errors.
(1323,0):dlm_request_join:1035 ERROR: status = -107
(1323,0):dlm_try_to_join_domain:1209 ERROR: status = -107
(1323,0):dlm_join_domain:1487 ERROR: status = -107
(1323,0):dlm_register_domain:1753 ERROR: status = -107
(1323,0):o2cb_cluster_connect:313 ERROR: status = -107
(1323,0):ocfs2_dlm_init:2963 ERROR: status = -107
(1323,0):ocfs2_mount_volume:1788 ERROR: status = -107
ocfs2: Unmounting device (253,1) on (node 0)
VMCI: Major device number is: 249
VMware memory control driver initialized
vmmemctl: started kernel thread pid=1522
ocfs2: Unregistered cluster interface o2cb
OCFS2 Node Manager 1.5.0
OCFS2 DLM 1.5.0
ocfs2: Registered cluster interface o2cb
OCFS2 DLMFS 1.5.0
OCFS2 User DLM kernel interface loaded
OCFS2 1.5.0
(1810,0):o2net_connect_expired:1656 ERROR: no connection established
with node 4 after 30.0 seconds, giving up and returning errors.
(1810,0):o2net_connect_expired:1656 ERROR: no connection established
with node 5 after 30.0 seconds, giving up and returning errors.
(1810,0):o2net_connect_expired:1656 ERROR: no connection established
with node 6 after 30.0 seconds, giving up and returning errors.
(1810,0):o2net_connect_expired:1656 ERROR: no connection established
with node 2 after 30.0 seconds, giving up and returning errors.
(1810,0):o2net_connect_expired:1656 ERROR: no connection established
with node 3 after 30.0 seconds, giving up and returning errors.
(1839,0):dlm_request_join:1035 ERROR: status = -107
(1839,0):dlm_try_to_join_domain:1209 ERROR: status = -107
(1839,0):dlm_join_domain:1487 ERROR: status = -107
(1839,0):dlm_register_domain:1753 ERROR: status = -107
(1839,0):o2cb_cluster_connect:313 ERROR: status = -107
(1839,0):ocfs2_dlm_init:2963 ERROR: status = -107
(1839,0):ocfs2_mount_volume:1788 ERROR: status = -107
ocfs2: Unmounting device (253,1) on (node 0)
So clearly ocfs2 the service things it can connect to the node, but nmap
sees the connection just fine. And Web2 can see the port on web1 just fine,
so there is no firewall blocking the connections.
I think it might be Fedora 12 used 1.50 for the OCFS kernel module and
CentOS 5.3/5.4 use 1.4.4-1. Am I correct in thinking this?
David
-----Original Message-----
From: Sunil Mushran [mailto:sunil.mushran at oracle.com]
Sent: Thursday, March 25, 2010 6:46 PM
To: David Murphy
Cc: ocfs2-users at oss.oracle.com
Subject: Re: [Ocfs2-users] Odd error on FC12 with ocfs2
hmm.. o2cb_ctl makes no connections. It just reads the cluster.conf and
populates configfs. AFAIK.
David Murphy wrote:
>
> We had 6 nodes running CentOS 5.4 using 1.4.3 ocfs2-tools.
>
>
>
> I decided to rebuild one node with FC12.
>
>
>
>
>
> Which is working fine, however
>
>
>
> Nmap 192.168.200.112 shows 7777 as open
>
> And
>
>
>
> O2cb_ctl is timing out when trying to connect to that node which then
> causes a 107 error. This happens with all node and all node have 7777
> open via nmap from the FC machine.
>
>
>
>
>
> Is there a way to further debug this to see what exactly o2cb_ctl is
> seeing when trying to connect?
>
>
>
>
>
> David
>
> ----------------------------------------------------------------------
> --
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
More information about the Ocfs2-users
mailing list