[Ocfs2-users] Odd error on FC12 with ocfs2
Angelo McComis
angelo at mccomis.com
Mon Mar 29 20:10:18 PDT 2010
Does it matter that the nodes are numbered 1-6 instead of 0-5?
On Mon, Mar 29, 2010 at 4:25 PM, Sunil Mushran <sunil.mushran at oracle.com> wrote:
> Enable some debugging.
>
> #debugfs.ocfs2 -l TCP allow
> ...do mount...
> #debugfs.ocfs2 -l TCP off
>
>
> David Murphy wrote:
>> [root at web2 ~]# nc -z 192.168.102.140 7777
>> Connection to 192.168.102.140 7777 port [tcp/cbt] succeeded!
>>
>> [root at web1 /etc/sysconfig/network-scripts]# nc -z 192.168.102.141 7777
>> Connection to 192.168.102.141 7777 port [tcp/cbt] succeeded!
>>
>> -----Original Message-----
>> From: Sunil Mushran [mailto:sunil.mushran at oracle.com]
>> Sent: Monday, March 29, 2010 5:08 PM
>> To: David Murphy
>> Cc: ocfs2-users at oss.oracle.com
>> Subject: Re: [Ocfs2-users] Odd error on FC12 with ocfs2
>>
>> What happens when you use netcat to ping the node?
>> nc -z host.example.com 7777
>>
>> David Murphy wrote:
>>
>>> Some additional data:
>>> From Web1 ( New Fedora Machine) to Web2:
>>> [root at web1 /etc/sysconfig/network-scripts]# nmap 192.168.102.141
>>>
>>> Starting Nmap 5.21 ( http://nmap.org ) at 2010-03-29 16:56 CDT
>>> Nmap scan report for 192.168.102.141
>>> Host is up (0.000076s latency).
>>> Not shown: 993 closed ports
>>> PORT STATE SERVICE
>>> 22/tcp open ssh
>>> 80/tcp open http
>>> 81/tcp open hosts2-ns
>>> 111/tcp open rpcbind
>>> 5666/tcp open nrpe
>>> 7777/tcp open unknown
>>> 9102/tcp open jetdirect
>>> MAC Address: 00:50:56:A3:58:5D (VMware)
>>>
>>> Nmap done: 1 IP address (1 host up) scanned in 1.18 seconds
>>>
>>>
>>> From web2 -> web1 (new fedora machine)
>>> [root at web2 ~]# nmap 192.168.102.140
>>>
>>> Starting Nmap 5.00 ( http://nmap.org ) at 2010-03-29 16:40 CDT
>>> Interesting ports on 192.168.102.140:
>>> Not shown: 994 closed ports
>>> PORT STATE SERVICE
>>> 22/tcp open ssh
>>> 80/tcp open http
>>> 81/tcp open hosts2-ns
>>> 111/tcp open rpcbind
>>> 443/tcp open https
>>> 7777/tcp open unknown
>>> MAC Address: 00:50:56:A3:14:62 (VMWare)
>>>
>>> Nmap done: 1 IP address (1 host up) scanned in 1.31 seconds
>>>
>>>
>>> Cluster.conf:
>>> cluster:
>>> node_count = 6
>>> name = appshare
>>>
>>> node:
>>> ip_port = 7777
>>> ip_address = 192.168.102.140
>>> number = 1
>>> name = web1
>>> cluster = appshare
>>>
>>> node:
>>> ip_port = 7777
>>> ip_address = 192.168.102.141
>>> number = 2
>>> name = web2
>>> cluster = appshare
>>>
>>> node:
>>> ip_port = 7777
>>> ip_address = 192.168.102.142
>>> number = 3
>>> name = web3
>>> cluster = appshare
>>>
>>> node:
>>> ip_port = 7777
>>> ip_address = 192.168.102.111
>>> number = 4
>>> name = rgapp1
>>> cluster = appshare
>>>
>>> node:
>>> ip_port = 7777
>>> ip_address = 192.168.102.122
>>> number = 5
>>> name = deploy
>>> cluster = appshare
>>>
>>> node:
>>> ip_port = 7777
>>> ip_address = 192.168.102.112
>>> number = 6
>>> name = app1
>>> cluster = appshare
>>>
>>> DMESG on WEB1:
>>> OCFS2 1.5.0
>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection established
>>> with node 2 after 30.0 seconds, giving up and returning errors.
>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection established
>>> with node 3 after 30.0 seconds, giving up and returning errors.
>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection established
>>> with node 4 after 30.0 seconds, giving up and returning errors.
>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection established
>>> with node 5 after 30.0 seconds, giving up and returning errors.
>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection established
>>> with node 6 after 30.0 seconds, giving up and returning errors.
>>> (1262,0):dlm_request_join:1035 ERROR: status = -107
>>> (1262,0):dlm_try_to_join_domain:1209 ERROR: status = -107
>>> (1262,0):dlm_join_domain:1487 ERROR: status = -107
>>> (1262,0):dlm_register_domain:1753 ERROR: status = -107
>>> (1262,0):o2cb_cluster_connect:313 ERROR: status = -107
>>> (1262,0):ocfs2_dlm_init:2963 ERROR: status = -107
>>> (1262,0):ocfs2_mount_volume:1788 ERROR: status = -107
>>> ocfs2: Unmounting device (253,1) on (node 0)
>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection established
>>> with node 2 after 30.0 seconds, giving up and returning errors.
>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection established
>>> with node 3 after 30.0 seconds, giving up and returning errors.
>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection established
>>> with node 5 after 30.0 seconds, giving up and returning errors.
>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection established
>>> with node 6 after 30.0 seconds, giving up and returning errors.
>>> (1323,0):dlm_request_join:1035 ERROR: status = -107
>>> (1323,0):dlm_try_to_join_domain:1209 ERROR: status = -107
>>> (1323,0):dlm_join_domain:1487 ERROR: status = -107
>>> (1323,0):dlm_register_domain:1753 ERROR: status = -107
>>> (1323,0):o2cb_cluster_connect:313 ERROR: status = -107
>>> (1323,0):ocfs2_dlm_init:2963 ERROR: status = -107
>>> (1323,0):ocfs2_mount_volume:1788 ERROR: status = -107
>>> ocfs2: Unmounting device (253,1) on (node 0)
>>> VMCI: Major device number is: 249
>>> VMware memory control driver initialized
>>> vmmemctl: started kernel thread pid=1522
>>> ocfs2: Unregistered cluster interface o2cb
>>> OCFS2 Node Manager 1.5.0
>>> OCFS2 DLM 1.5.0
>>> ocfs2: Registered cluster interface o2cb
>>> OCFS2 DLMFS 1.5.0
>>> OCFS2 User DLM kernel interface loaded
>>> OCFS2 1.5.0
>>> (1810,0):o2net_connect_expired:1656 ERROR: no connection established
>>> with node 4 after 30.0 seconds, giving up and returning errors.
>>> (1810,0):o2net_connect_expired:1656 ERROR: no connection established
>>> with node 5 after 30.0 seconds, giving up and returning errors.
>>> (1810,0):o2net_connect_expired:1656 ERROR: no connection established
>>> with node 6 after 30.0 seconds, giving up and returning errors.
>>> (1810,0):o2net_connect_expired:1656 ERROR: no connection established
>>> with node 2 after 30.0 seconds, giving up and returning errors.
>>> (1810,0):o2net_connect_expired:1656 ERROR: no connection established
>>> with node 3 after 30.0 seconds, giving up and returning errors.
>>> (1839,0):dlm_request_join:1035 ERROR: status = -107
>>> (1839,0):dlm_try_to_join_domain:1209 ERROR: status = -107
>>> (1839,0):dlm_join_domain:1487 ERROR: status = -107
>>> (1839,0):dlm_register_domain:1753 ERROR: status = -107
>>> (1839,0):o2cb_cluster_connect:313 ERROR: status = -107
>>> (1839,0):ocfs2_dlm_init:2963 ERROR: status = -107
>>> (1839,0):ocfs2_mount_volume:1788 ERROR: status = -107
>>> ocfs2: Unmounting device (253,1) on (node 0)
>>>
>>>
>>>
>>> So clearly ocfs2 the service things it can connect to the node, but nmap
>>> sees the connection just fine. And Web2 can see the port on web1 just
>>>
>> fine,
>>
>>> so there is no firewall blocking the connections.
>>>
>>> I think it might be Fedora 12 used 1.50 for the OCFS kernel module and
>>> CentOS 5.3/5.4 use 1.4.4-1. Am I correct in thinking this?
>>>
>>> David
>>> -----Original Message-----
>>> From: Sunil Mushran [mailto:sunil.mushran at oracle.com]
>>> Sent: Thursday, March 25, 2010 6:46 PM
>>> To: David Murphy
>>> Cc: ocfs2-users at oss.oracle.com
>>> Subject: Re: [Ocfs2-users] Odd error on FC12 with ocfs2
>>>
>>> hmm.. o2cb_ctl makes no connections. It just reads the cluster.conf and
>>> populates configfs. AFAIK.
>>>
>>> David Murphy wrote:
>>>
>>>
>>>> We had 6 nodes running CentOS 5.4 using 1.4.3 ocfs2-tools.
>>>>
>>>>
>>>>
>>>> I decided to rebuild one node with FC12.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Which is working fine, however
>>>>
>>>>
>>>>
>>>> Nmap 192.168.200.112 shows 7777 as open
>>>>
>>>> And
>>>>
>>>>
>>>>
>>>> O2cb_ctl is timing out when trying to connect to that node which then
>>>> causes a 107 error. This happens with all node and all node have 7777
>>>> open via nmap from the FC machine.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Is there a way to further debug this to see what exactly o2cb_ctl is
>>>> seeing when trying to connect?
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> David
>>>>
>>>> ----------------------------------------------------------------------
>>>> --
>>>>
>>>> _______________________________________________
>>>> Ocfs2-users mailing list
>>>> Ocfs2-users at oss.oracle.com
>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>
>>>>
>>>
>>>
>>
>>
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>
More information about the Ocfs2-users
mailing list