[Ocfs2-users] Odd error on FC12 with ocfs2
David Murphy
david at icewatermedia.com
Tue Mar 30 08:27:57 PDT 2010
[root at web1 /dev]# debugfs.ocfs2 -l TCP off /dev/mapper/OCFS2_200Gp1
[root at web1 /dev]# mount /dev/mapper/OCFS2_200Gp1 -v
device=/dev/mapper/OCFS2_200Gp1
mount.ocfs2: Transport endpoint is not connected while mounting
/dev/mapper/OCFS2_200Gp1 on /mnt/appshare. Check 'dmesg' for more
information on this error.
[root at web1 /dev]#dmesg
DMESG:
Mar 30 10:23:38 web1 kernel: (1236,0):o2net_connect_expired:1656
ERROR: no connection established with node 2 after 30.0 seconds, giving up
and returning errors.
Mar 30 10:23:38 web1 kernel: (1236,0):o2net_connect_expired:1656
ERROR: no connection established with node 3 after 30.0 seconds, giving up
and returning errors.
Mar 30 10:23:38 web1 kernel: (1236,0):o2net_connect_expired:1656
ERROR: no connection established with node 4 after 30.0 seconds, giving up
and returning errors.
Mar 30 10:23:38 web1 kernel: (1236,0):o2net_connect_expired:1656
ERROR: no connection established with node 5 after 30.0 seconds, giving up
and returning errors.
Mar 30 10:23:38 web1 kernel: (1236,0):o2net_connect_expired:1656
ERROR: no connection established with node 6 after 30.0 seconds, giving up
and returning errors.
Mar 30 10:23:38 web1 kernel: (1740,0):dlm_request_join:1035 ERROR:
status = -107
Mar 30 10:23:38 web1 kernel: (1740,0):dlm_try_to_join_domain:1209
ERROR: status = -107
Mar 30 10:23:38 web1 kernel: (1740,0):dlm_join_domain:1487 ERROR:
status = -107
Mar 30 10:23:38 web1 kernel: (1740,0):dlm_register_domain:1753
ERROR: status = -107
Mar 30 10:23:38 web1 kernel: (1740,0):o2cb_cluster_connect:313
ERROR: status = -107
Mar 30 10:23:38 web1 kernel: (1740,0):ocfs2_dlm_init:2963 ERROR:
status = -107
Mar 30 10:23:38 web1 kernel: (1740,0):ocfs2_mount_volume:1788 ERROR:
status = -107
Mar 30 10:23:38 web1 kernel: ocfs2: Unmounting device (253,1) on
(node 0)
DEBUGFS:
debugfs: curdev
/dev/mapper/OCFS2_200Gp1
debugfs: controld dump
controld: Unable to access cluster service while obtaining the debug
buffer
debugfs: slotmap
Slot# Node#
0 3
1 5
2 2
4 4
5 6
debugfs: stats
Revision: 0.90
Mount Count: 0 Max Mount Count: 20
State: 0 Errors: 0
Check Interval: 0 Last Check: Mon Mar 29 10:53:52 2010
Creator OS: 0
Feature Compat: 1 backup-super
Feature Incompat: 16 sparse
Tunefs Incomplete: 0
Feature RO compat: 1 unwritten
Root Blknum: 5 System Dir Blknum: 6
First Cluster Group Blknum: 3
Block Size Bits: 12 Cluster Size Bits: 12
Max Node Slots: 6
Extended Attributes Inline Size: 0
Label: OCFS2_APPSHARE_200G
UUID: D6E0DD0AAC8844ED94A4A459FBB6F7FF
UUID_hash: 0 (0x0)
Cluster stack: classic o2cb
Inode: 2 Mode: 00 Generation: 2428834932 (0x90c51474)
FS Generation: 2428834932 (0x90c51474)
CRC32: 00000000 ECC: 0000
Type: Unknown Attr: 0x0 Flags: Valid System Superblock
Dynamic Features: (0x0)
User: 0 (root) Group: 0 (root) Size: 0
Links: 0 Clusters: 52428119
ctime: 0x4a0b2372 -- Wed May 13 14:45:54 2009
atime: 0x0 -- Wed Dec 31 18:00:00 1969
mtime: 0x4a0b2372 -- Wed May 13 14:45:54 2009
dtime: 0x0 -- Wed Dec 31 18:00:00 1969
ctime_nsec: 0x00000000 -- 0
atime_nsec: 0x00000000 -- 0
mtime_nsec: 0x00000000 -- 0
Last Extblk: 0
Sub Alloc Slot: Global Sub Alloc Bit: 65535
It doesn't appear any extra debug logging actually was created.
David
-----Original Message-----
From: Sunil Mushran [mailto:sunil.mushran at oracle.com]
Sent: Monday, March 29, 2010 10:23 PM
To: Angelo McComis
Cc: David Murphy; ocfs2-users at oss.oracle.com
Subject: Re: [Ocfs2-users] Odd error on FC12 with ocfs2
No
On Mar 29, 2010, at 8:10 PM, Angelo McComis <angelo at mccomis.com> wrote:
> Does it matter that the nodes are numbered 1-6 instead of 0-5?
>
>
>
> On Mon, Mar 29, 2010 at 4:25 PM, Sunil Mushran
> <sunil.mushran at oracle.com
> > wrote:
>> Enable some debugging.
>>
>> #debugfs.ocfs2 -l TCP allow
>> ...do mount...
>> #debugfs.ocfs2 -l TCP off
>>
>>
>> David Murphy wrote:
>>> [root at web2 ~]# nc -z 192.168.102.140 7777 Connection to
>>> 192.168.102.140 7777 port [tcp/cbt] succeeded!
>>>
>>> [root at web1 /etc/sysconfig/network-scripts]# nc -z 192.168.102.141
>>> 7777 Connection to 192.168.102.141 7777 port [tcp/cbt] succeeded!
>>>
>>> -----Original Message-----
>>> From: Sunil Mushran [mailto:sunil.mushran at oracle.com]
>>> Sent: Monday, March 29, 2010 5:08 PM
>>> To: David Murphy
>>> Cc: ocfs2-users at oss.oracle.com
>>> Subject: Re: [Ocfs2-users] Odd error on FC12 with ocfs2
>>>
>>> What happens when you use netcat to ping the node?
>>> nc -z host.example.com 7777
>>>
>>> David Murphy wrote:
>>>
>>>> Some additional data:
>>>> From Web1 ( New Fedora Machine) to Web2:
>>>> [root at web1 /etc/sysconfig/network-scripts]# nmap
>>>> 192.168.102.141
>>>>
>>>> Starting Nmap 5.21 ( http://nmap.org ) at 2010-03-29 16:56 CDT
>>>> Nmap scan report for 192.168.102.141
>>>> Host is up (0.000076s latency).
>>>> Not shown: 993 closed ports
>>>> PORT STATE SERVICE
>>>> 22/tcp open ssh
>>>> 80/tcp open http
>>>> 81/tcp open hosts2-ns
>>>> 111/tcp open rpcbind
>>>> 5666/tcp open nrpe
>>>> 7777/tcp open unknown
>>>> 9102/tcp open jetdirect
>>>> MAC Address: 00:50:56:A3:58:5D (VMware)
>>>>
>>>> Nmap done: 1 IP address (1 host up) scanned in 1.18 seconds
>>>>
>>>>
>>>> From web2 -> web1 (new fedora machine)
>>>> [root at web2 ~]# nmap 192.168.102.140
>>>>
>>>> Starting Nmap 5.00 ( http://nmap.org ) at 2010-03-29 16:40 CDT
>>>> Interesting ports on 192.168.102.140:
>>>> Not shown: 994 closed ports
>>>> PORT STATE SERVICE
>>>> 22/tcp open ssh
>>>> 80/tcp open http
>>>> 81/tcp open hosts2-ns
>>>> 111/tcp open rpcbind
>>>> 443/tcp open https
>>>> 7777/tcp open unknown
>>>> MAC Address: 00:50:56:A3:14:62 (VMWare)
>>>>
>>>> Nmap done: 1 IP address (1 host up) scanned in 1.31 seconds
>>>>
>>>>
>>>> Cluster.conf:
>>>> cluster:
>>>> node_count = 6
>>>> name = appshare
>>>>
>>>> node:
>>>> ip_port = 7777
>>>> ip_address = 192.168.102.140
>>>> number = 1
>>>> name = web1
>>>> cluster = appshare
>>>>
>>>> node:
>>>> ip_port = 7777
>>>> ip_address = 192.168.102.141
>>>> number = 2
>>>> name = web2
>>>> cluster = appshare
>>>>
>>>> node:
>>>> ip_port = 7777
>>>> ip_address = 192.168.102.142
>>>> number = 3
>>>> name = web3
>>>> cluster = appshare
>>>>
>>>> node:
>>>> ip_port = 7777
>>>> ip_address = 192.168.102.111
>>>> number = 4
>>>> name = rgapp1
>>>> cluster = appshare
>>>>
>>>> node:
>>>> ip_port = 7777
>>>> ip_address = 192.168.102.122
>>>> number = 5
>>>> name = deploy
>>>> cluster = appshare
>>>>
>>>> node:
>>>> ip_port = 7777
>>>> ip_address = 192.168.102.112
>>>> number = 6
>>>> name = app1
>>>> cluster = appshare
>>>>
>>>> DMESG on WEB1:
>>>> OCFS2 1.5.0
>>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection
>>>> established with node 2 after 30.0 seconds, giving up and returning
>>>> errors.
>>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection
>>>> established with node 3 after 30.0 seconds, giving up and returning
>>>> errors.
>>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection
>>>> established with node 4 after 30.0 seconds, giving up and returning
>>>> errors.
>>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection
>>>> established with node 5 after 30.0 seconds, giving up and returning
>>>> errors.
>>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection
>>>> established with node 6 after 30.0 seconds, giving up and returning
>>>> errors.
>>>> (1262,0):dlm_request_join:1035 ERROR: status = -107
>>>> (1262,0):dlm_try_to_join_domain:1209 ERROR: status = -107
>>>> (1262,0):dlm_join_domain:1487 ERROR: status = -107
>>>> (1262,0):dlm_register_domain:1753 ERROR: status = -107
>>>> (1262,0):o2cb_cluster_connect:313 ERROR: status = -107
>>>> (1262,0):ocfs2_dlm_init:2963 ERROR: status = -107
>>>> (1262,0):ocfs2_mount_volume:1788 ERROR: status = -107
>>>> ocfs2: Unmounting device (253,1) on (node 0)
>>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection
>>>> established with node 2 after 30.0 seconds, giving up and returning
>>>> errors.
>>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection
>>>> established with node 3 after 30.0 seconds, giving up and returning
>>>> errors.
>>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection
>>>> established with node 5 after 30.0 seconds, giving up and returning
>>>> errors.
>>>> (1199,0):o2net_connect_expired:1656 ERROR: no connection
>>>> established with node 6 after 30.0 seconds, giving up and returning
>>>> errors.
>>>> (1323,0):dlm_request_join:1035 ERROR: status = -107
>>>> (1323,0):dlm_try_to_join_domain:1209 ERROR: status = -107
>>>> (1323,0):dlm_join_domain:1487 ERROR: status = -107
>>>> (1323,0):dlm_register_domain:1753 ERROR: status = -107
>>>> (1323,0):o2cb_cluster_connect:313 ERROR: status = -107
>>>> (1323,0):ocfs2_dlm_init:2963 ERROR: status = -107
>>>> (1323,0):ocfs2_mount_volume:1788 ERROR: status = -107
>>>> ocfs2: Unmounting device (253,1) on (node 0)
>>>> VMCI: Major device number is: 249
>>>> VMware memory control driver initialized
>>>> vmmemctl: started kernel thread pid=1522
>>>> ocfs2: Unregistered cluster interface o2cb
>>>> OCFS2 Node Manager 1.5.0
>>>> OCFS2 DLM 1.5.0
>>>> ocfs2: Registered cluster interface o2cb
>>>> OCFS2 DLMFS 1.5.0
>>>> OCFS2 User DLM kernel interface loaded
>>>> OCFS2 1.5.0
>>>> (1810,0):o2net_connect_expired:1656 ERROR: no connection
>>>> established with node 4 after 30.0 seconds, giving up and returning
>>>> errors.
>>>> (1810,0):o2net_connect_expired:1656 ERROR: no connection
>>>> established with node 5 after 30.0 seconds, giving up and returning
>>>> errors.
>>>> (1810,0):o2net_connect_expired:1656 ERROR: no connection
>>>> established with node 6 after 30.0 seconds, giving up and returning
>>>> errors.
>>>> (1810,0):o2net_connect_expired:1656 ERROR: no connection
>>>> established with node 2 after 30.0 seconds, giving up and returning
>>>> errors.
>>>> (1810,0):o2net_connect_expired:1656 ERROR: no connection
>>>> established with node 3 after 30.0 seconds, giving up and returning
>>>> errors.
>>>> (1839,0):dlm_request_join:1035 ERROR: status = -107
>>>> (1839,0):dlm_try_to_join_domain:1209 ERROR: status = -107
>>>> (1839,0):dlm_join_domain:1487 ERROR: status = -107
>>>> (1839,0):dlm_register_domain:1753 ERROR: status = -107
>>>> (1839,0):o2cb_cluster_connect:313 ERROR: status = -107
>>>> (1839,0):ocfs2_dlm_init:2963 ERROR: status = -107
>>>> (1839,0):ocfs2_mount_volume:1788 ERROR: status = -107
>>>> ocfs2: Unmounting device (253,1) on (node 0)
>>>>
>>>>
>>>>
>>>> So clearly ocfs2 the service things it can connect to the node,
>>>> but nmap sees the connection just fine. And Web2 can see the port
>>>> on web1 just
>>>>
>>> fine,
>>>
>>>> so there is no firewall blocking the connections.
>>>>
>>>> I think it might be Fedora 12 used 1.50 for the OCFS kernel
>>>> module and
>>>> CentOS 5.3/5.4 use 1.4.4-1. Am I correct in thinking this?
>>>>
>>>> David
>>>> -----Original Message-----
>>>> From: Sunil Mushran [mailto:sunil.mushran at oracle.com]
>>>> Sent: Thursday, March 25, 2010 6:46 PM
>>>> To: David Murphy
>>>> Cc: ocfs2-users at oss.oracle.com
>>>> Subject: Re: [Ocfs2-users] Odd error on FC12 with ocfs2
>>>>
>>>> hmm.. o2cb_ctl makes no connections. It just reads the cluster.conf
>>>> and populates configfs. AFAIK.
>>>>
>>>> David Murphy wrote:
>>>>
>>>>
>>>>> We had 6 nodes running CentOS 5.4 using 1.4.3 ocfs2-tools.
>>>>>
>>>>>
>>>>>
>>>>> I decided to rebuild one node with FC12.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Which is working fine, however
>>>>>
>>>>>
>>>>>
>>>>> Nmap 192.168.200.112 shows 7777 as open
>>>>>
>>>>> And
>>>>>
>>>>>
>>>>>
>>>>> O2cb_ctl is timing out when trying to connect to that node which
>>>>> then causes a 107 error. This happens with all node and all node
>>>>> have
>>>>> 7777
>>>>> open via nmap from the FC machine.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Is there a way to further debug this to see what exactly
>>>>> o2cb_ctl is
>>>>> seeing when trying to connect?
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> David
>>>>>
>>>>> ---
>>>>> ---
>>>>> ----------------------------------------------------------------
>>>>> --
>>>>>
>>>>> _______________________________________________
>>>>> Ocfs2-users mailing list
>>>>> Ocfs2-users at oss.oracle.com
>>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
More information about the Ocfs2-users
mailing list