[Ocfs2-users] Transport endpoint is not connected while mounting?
quanta
quanta.linux at gmail.com
Fri Dec 21 19:04:43 PST 2012
I have replaced a dead node that was running in dual-primary mode with
OCFS2. All the steps work:
`/proc/drbd`
version: 8.3.13 (api:88/proto:86-96)
GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by
mockbuild at builder10.centos.org, 2012-05-07 11:56:36
1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
ns:81 nr:407832 dw:106657970 dr:266340 al:179 bm:6551 lo:0 pe:0
ua:0 ap:0 ep:1 wo:b oos:0
until I try to mount the volume:
mount -t ocfs2 /dev/drbd1 /data/webroot/
mount.ocfs2: Transport endpoint is not connected while mounting
/dev/drbd1 on /data/webroot/. Check 'dmesg' for more information on this
error.
`/var/log/kern.log`
kernel: (o2net,11427,1):o2net_connect_expired:1664 ERROR: no
connection established with node 0 after 30.0 seconds, giving up and
returning errors.
kernel: (mount.ocfs2,12037,1):dlm_request_join:1036 ERROR: status =
-107
kernel: (mount.ocfs2,12037,1):dlm_try_to_join_domain:1210 ERROR:
status = -107
kernel: (mount.ocfs2,12037,1):dlm_join_domain:1488 ERROR: status = -107
kernel: (mount.ocfs2,12037,1):dlm_register_domain:1754 ERROR:
status = -107
kernel: (mount.ocfs2,12037,1):ocfs2_dlm_init:2808 ERROR: status = -107
kernel: (mount.ocfs2,12037,1):ocfs2_mount_volume:1447 ERROR: status
= -107
kernel: ocfs2: Unmounting device (147,1) on (node 1)
I'm sure `/etc/ocfs2/cluster.conf` on the both node are identical:
`/etc/ocfs2/cluster.conf`
node:
ip_port = 7777
ip_address = 192.168.3.145
number = 0
name = SVR233NTC-3145.localdomain
cluster = cpc
node:
ip_port = 7777
ip_address = 192.168.2.93
number = 1
name = SVR022-293.localdomain
cluster = cpc
cluster:
node_count = 2
name = cpc
and they are connected fine:
# nc -z 192.168.3.145 7777
Connection to 192.168.3.145 7777 port [tcp/cbt] succeeded!
but the O2CB heartbeat is not active on the new node (192.168.2.93):
`/etc/init.d/o2cb status`
Driver for "configfs": Loaded
Filesystem "configfs": Mounted
Driver for "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking O2CB cluster cpc: Online
Heartbeat dead threshold = 31
Network idle timeout: 30000
Network keepalive delay: 2000
Network reconnect delay: 2000
Checking O2CB heartbeat: Not active
Here're the results when running `tcpdump` on the node 0 while starting
the `ocfs2` on the node 1:
1 0.000000 192.168.2.93 -> 192.168.3.145 TCP 70 55274 > cbt
[SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSval=690432180 TSecr=0
2 0.000008 192.168.3.145 -> 192.168.2.93 TCP 70 cbt > 55274
[SYN, ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1460 TSval=707657223
TSecr=690432180
3 0.000223 192.168.2.93 -> 192.168.3.145 TCP 66 55274 > cbt
[ACK] Seq=1 Ack=1 Win=5840 Len=0 TSval=690432181 TSecr=707657223
4 0.000286 192.168.2.93 -> 192.168.3.145 TCP 98 55274 > cbt
[PSH, ACK] Seq=1 Ack=1 Win=5840 Len=32 TSval=690432181 TSecr=707657223
5 0.000292 192.168.3.145 -> 192.168.2.93 TCP 66 cbt > 55274
[ACK] Seq=1 Ack=33 Win=5792 Len=0 TSval=707657223 TSecr=690432181
6 0.000324 192.168.3.145 -> 192.168.2.93 TCP 66 cbt > 55274
[RST, ACK] Seq=1 Ack=33 Win=5792 Len=0 TSval=707657223 TSecr=690432181
The `RST` flag is sent after every 6 packets.
What other can I do to debug this case?
PS:
OCFS2 versions on the node 0:
- ocfs2-tools-1.4.4-1.el5
- ocfs2-2.6.18-274.12.1.el5-1.4.7-1.el5
OCFS2 versions on the node 1:
- ocfs2-tools-1.4.4-1.el5
- ocfs2-2.6.18-308.el5-1.4.7-1.el5
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20121222/e2493dec/attachment.html
More information about the Ocfs2-users
mailing list