[Ocfs2-users] [Ocfs2-devel] Transport endpoint is not connected while mounting....

Thu Jan 15 07:12:16 PST 2009

So it looks like iptables is what is stopping it from working. After  
disabling iptables completely for 1 minute then trying to mount on  
node 1 it worked fine.

So my new question is why did `iptables -A INPUT -ptcp --dport 7777 -j  
ACCEPT ; service iptables save` not allow ocfs2 to talk?  What do  
people add the their iptables?

-Bret

On Jan 14, 2009, at 4:50 PM, Sunil Mushran wrote:

> It's part and parcel of the fs. If you want mainline linux,
> goto http://kernel.org.
>
> Bret Palsson wrote:
>> Can I get the source for DLM 1.5.0 and build it on my other machines?
>> If so where do I grab it?
>>
>> Thanks,
>>
>> Bret
>>
>> On Jan 14, 2009, at 4:28 PM, Sunil Mushran wrote:
>>
>>> I hate cut-paste's because I have no idea whether I can trust it
>>> or not. A misspelled 0 and 1 makes a whole world of difference.
>>>
>>> But the following seems to indicate that the configuration is bad.
>>>
>>> (3130,1):o2net_connect_expired:1659 ERROR: no connection established
>>> with node 0 after 30.0 seconds, giving up and returning errors.
>>> (4670,1):dlm_request_join:1033 ERROR: status = -107
>>> (4670,1):dlm_try_to_join_domain:1207 ERROR: status = -107
>>> (4670,1):dlm_join_domain:1485 ERROR: status = -107
>>> (4670,1):dlm_register_domain:1732 ERROR: status = -107
>>> (4670,1):o2cb_cluster_connect:302 ERROR: status = -107
>>> (4670,1):ocfs2_dlm_init:2753 ERROR: status = -107
>>> (4670,1):ocfs2_mount_volume:1274 ERROR: status = -107
>>> ocfs2: Unmounting device (253,2) on (node 0)
>>>
>>> Why is the mount failing on node 0? I thought it was mounted on
>>> node 0?
>>>
>>> Maybe best if you file a bugzilla and attach the /var/log/messages
>>> of both nodes. Indicate the time you did the mount.
>>>
>>> Sunil
>>>
>>> Bret Palsson wrote:
>>>> Output of Node 0 {
>>>>
>>>> OCFS2 Node Manager 1.4.1 Tue Dec 16 19:18:05 PST 2008 (build
>>>> 0f78045c75c0174e50e4cf0934bf9eae)
>>>> OCFS2 DLM 1.4.1 Tue Dec 16 19:18:05 PST 2008 (build
>>>> 4ce8fae327880c466761f40fb7619490)
>>>> OCFS2 DLMFS 1.4.1 Tue Dec 16 19:18:05 PST 2008 (build
>>>> 4ce8fae327880c466761f40fb7619490)
>>>> OCFS2 User DLM kernel interface loaded
>>>> SELinux: initialized (dev ocfs2_dlmfs, type ocfs2_dlmfs), not
>>>> configured for labeling
>>>> eth3: no IPv6 routers present
>>>> OCFS2 1.4.1 Tue Dec 16 19:18:02 PST 2008 (build
>>>> 3fc82af4b5669945497b322b6aabd031)
>>>> ocfs2_dlm: Nodes in domain ("8B2CCF82F1BA4A70B587580B23D9D7F7"): 0
>>>> kjournald starting.  Commit interval 5 seconds
>>>> ocfs2: Mounting device (253,3) on (node 0, slot 0) with ordered  
>>>> data
>>>> mode.
>>>> SELinux: initialized (dev dm-3, type ocfs2), not configured for
>>>> labeling
>>>> ocfs2_dlm: Nodes in domain ("222B65A090D6477481AD30DE9FCE7961"): 0
>>>> kjournald starting.  Commit interval 5 seconds
>>>> ocfs2: Mounting device (253,2) on (node 0, slot 0) with ordered  
>>>> data
>>>> mode.
>>>> SELinux: initialized (dev dm-2, type ocfs2), not configured for
>>>> labeling
>>>> ocfs2_dlm: Nodes in domain ("0425C0367AF547E989864A46F3DBD6E6"): 0
>>>> kjournald starting.  Commit interval 5 seconds
>>>> ocfs2: Mounting device (253,4) on (node 0, slot 0) with ordered  
>>>> data
>>>> mode.
>>>> SELinux: initialized (dev dm-4, type ocfs2), not configured for
>>>> labeling
>>>> }
>>>>
>>>> Output of Node 1 {
>>>> OCFS2 Node Manager 1.5.0
>>>> OCFS2 DLM 1.5.0
>>>> ocfs2: Registered cluster interface o2cb
>>>> OCFS2 DLMFS 1.5.0
>>>> OCFS2 User DLM kernel interface loaded
>>>> device eth0 entered promiscuous mode
>>>> OCFS2 1.5.0
>>>> }
>>>>
>>>>
>>>> On Jan 14, 2009, at 3:58 PM, Sunil Mushran wrote:
>>>>
>>>>> What about the dmesg on node 1?
>>>>>
>>>>> Now ideally we want the fs versions to be the same on all nodes.
>>>>> However as we have not changed the protocol since 1.4.1, this
>>>>> should still work.
>>>>>
>>>>> Bret Palsson wrote:
>>>>>> node 0 (and FS) OCFS2 1.4.1 2.6.18-92.1.22.el5xen
>>>>>> node 1 OCFS 21.5 2.6.28-vs2.3.0.36.4
>>>>>>
>>>>>> Output of Node 1 {
>>>>>> OCFS2 Node Manager 1.5.0
>>>>>> OCFS2 DLM 1.5.0
>>>>>> ocfs2: Registered cluster interface o2cb
>>>>>> OCFS2 DLMFS 1.5.0
>>>>>> OCFS2 User DLM kernel interface loaded
>>>>>> device eth0 entered promiscuous mode
>>>>>> OCFS2 1.5.0
>>>>>> }
>>>>>> On Jan 14, 2009, at 1:41 PM, Sunil Mushran wrote:
>>>>>>
>>>>>>
>>>>>>> versions? kernel and fs.
>>>>>>>
>>>>>>> Bret Palsson wrote:
>>>>>>>
>>>>>>>> Does anyone have any idea what to try next? Here are the  
>>>>>>>> steps I
>>>>>>>> have
>>>>>>>> taken and the problem:     (I wanted to post my question on the
>>>>>>>> first
>>>>>>>> line before I explained the problem and what I have tried)
>>>>>>>>
>>>>>>>> ----------
>>>>>>>>
>>>>>>>> Node 0 has the file system mounted just fine and works great.
>>>>>>>>
>>>>>>>> When trying to mount on Node 1: `mount.ocfs2 /dev/mapper/data /
>>>>>>>> cluster/
>>>>>>>> data`  I get this error after about 30 seconds: mount.ocfs2:
>>>>>>>> Transport
>>>>>>>> endpoint is not connected while mounting /dev/mapper/data on /
>>>>>>>> cluster/
>>>>>>>> data. Check 'dmesg' for more information on this error.
>>>>>>>>
>>>>>>>>
>>>>>>>> Here is the output of dmesg:
>>>>>>>> (3130,1):o2net_connect_expired:1659 ERROR: no connection
>>>>>>>> established
>>>>>>>> with node 0 after 30.0 seconds, giving up and returning errors.
>>>>>>>> (4670,1):dlm_request_join:1033 ERROR: status = -107
>>>>>>>> (4670,1):dlm_try_to_join_domain:1207 ERROR: status = -107
>>>>>>>> (4670,1):dlm_join_domain:1485 ERROR: status = -107
>>>>>>>> (4670,1):dlm_register_domain:1732 ERROR: status = -107
>>>>>>>> (4670,1):o2cb_cluster_connect:302 ERROR: status = -107
>>>>>>>> (4670,1):ocfs2_dlm_init:2753 ERROR: status = -107
>>>>>>>> (4670,1):ocfs2_mount_volume:1274 ERROR: status = -107
>>>>>>>> ocfs2: Unmounting device (253,2) on (node 0)
>>>>>>>> (3130,0):o2net_connect_expired:1659 ERROR: no connection
>>>>>>>> established
>>>>>>>> with node 0 after 30.0 seconds, giving up and returning errors.
>>>>>>>> (5558,1):dlm_request_join:1033 ERROR: status = -107
>>>>>>>> (5558,1):dlm_try_to_join_domain:1207 ERROR: status = -107
>>>>>>>> (5558,1):dlm_join_domain:1485 ERROR: status = -107
>>>>>>>> (5558,1):dlm_register_domain:1732 ERROR: status = -107
>>>>>>>> (5558,1):o2cb_cluster_connect:302 ERROR: status = -107
>>>>>>>> (5558,1):ocfs2_dlm_init:2753 ERROR: status = -107
>>>>>>>> (5558,1):ocfs2_mount_volume:1274 ERROR: status = -107
>>>>>>>> ocfs2: Unmounting device (253,2) on (node 0)
>>>>>>>>
>>>>>>>>
>>>>>>>> So I figured that It must be a firewall issue. I first disabled
>>>>>>>> iptables on both machines and got the same results so I  
>>>>>>>> started ip
>>>>>>>> talbes adding an exception on both machines: `iptables -A  
>>>>>>>> INPUT -p
>>>>>>>> tcp
>>>>>>>> --dport 7777 -j ACCEPT ; service iptables save`
>>>>>>>>
>>>>>>>> The machines can ping each other. and they have the exact same
>>>>>>>> config:
>>>>>>>> cluster:
>>>>>>>>   node_count = 2
>>>>>>>>   name = ocfs2
>>>>>>>> node:
>>>>>>>>   ip_port = 7777
>>>>>>>>   ip_address = 10.128.255.3
>>>>>>>>   number = 0
>>>>>>>>   name = m3.c12.jiveip.net
>>>>>>>>   cluster = ocfs2
>>>>>>>> node:
>>>>>>>>   ip_port = 7777
>>>>>>>>   ip_address = 10.128.7.33
>>>>>>>>   number = 1
>>>>>>>>   name = pbx_33.c12.jiveip.net
>>>>>>>>   cluster = ocfs2
>>>>>>>>
>>>>>>>>
>>>>>>>> I then decided to use tcpdump to see what's up (on both  
>>>>>>>> machines):
>>>>>>>> `tcpdump -i eth0 port 7777 -v`
>>>>>>>>
>>>>>>>> Here is a TCP dump showing port 7777 is not blocked (I added an
>>>>>>>> exception in IP tables)
>>>>>>>> (Node 0)
>>>>>>>> 13:13:11.711539 IP (tos 0x0, ttl  64, id 18286, offset 0, flags
>>>>>>>> [DF],
>>>>>>>> proto: TCP (6), length: 60) 10.128.7.33.47601 >
>>>>>>>> 10.128.255.3.cbt: S,
>>>>>>>> cksum 0xd272 (correct), 3820380795:3820380795(0) win 5840 <mss
>>>>>>>> 1460,sackOK,timestamp 4294911253 0,nop,wscale 6>
>>>>>>>> 13:13:14.710703 IP (tos 0x0, ttl  64, id 18287, offset 0, flags
>>>>>>>> [DF],
>>>>>>>> proto: TCP (6), length: 60) 10.128.7.33.47601 >
>>>>>>>> 10.128.255.3.cbt: S,
>>>>>>>> cksum 0xc6ba (correct), 3820380795:3820380795(0) win 5840 <mss
>>>>>>>> 1460,sackOK,timestamp 4294914253 0,nop,wscale 6>
>>>>>>>> 13:13:14.711213 IP (tos 0x0, ttl  64, id 2241, offset 0, flags
>>>>>>>> [DF],
>>>>>>>> proto: TCP (6), length: 60) 10.128.7.33.54763 >
>>>>>>>> 10.128.255.3.cbt: S,
>>>>>>>> cksum 0xd2ae (correct), 3862378508:3862378508(0) win 5840 <mss
>>>>>>>> 1460,sackOK,timestamp 4294914253 0,nop,wscale 6>
>>>>>>>>
>>>>>>>> (Node 1)
>>>>>>>> 13:13:09.956999 IP (tos 0x0, ttl  64, id 18286, offset 0, flags
>>>>>>>> [DF],
>>>>>>>> proto: TCP (6), length: 60) 10.128.7.33.47601 >
>>>>>>>> 10.128.255.3.cbt: S,
>>>>>>>> cksum 0xd272 (correct), 3820380795:3820380795(0) win 5840 <mss
>>>>>>>> 1460,sackOK,timestamp 4294911253 0,nop,wscale 6>
>>>>>>>> 13:13:12.956999 IP (tos 0x0, ttl  64, id 18287, offset 0, flags
>>>>>>>> [DF],
>>>>>>>> proto: TCP (6), length: 60) 10.128.7.33.47601 >
>>>>>>>> 10.128.255.3.cbt: S,
>>>>>>>> cksum 0xc6ba (correct), 3820380795:3820380795(0) win 5840 <mss
>>>>>>>> 1460,sackOK,timestamp 4294914253 0,nop,wscale 6>
>>>>>>>> 13:13:12.956999 IP (tos 0x0, ttl  64, id 2241, offset 0, flags
>>>>>>>> [DF],
>>>>>>>> proto: TCP (6), length: 60) 10.128.7.33.54763 >
>>>>>>>> 10.128.255.3.cbt: S,
>>>>>>>> cksum 0xd2ae (correct), 3862378508:3862378508(0) win 5840 <mss
>>>>>>>> 1460,sackOK,timestamp 4294914253 0,nop,wscale 6>
>>>>>>>>
>>>>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20090115/f4909226/attachment-0001.html