[Ocfs2-users] Transport endpoint is not connected while mounting?

quanta quanta.linux at gmail.com
Fri Dec 21 19:04:43 PST 2012


I have replaced a dead node that was running in dual-primary mode with 
OCFS2. All the steps work:

`/proc/drbd`

     version: 8.3.13 (api:88/proto:86-96)
     GIT-hash: 83ca112086600faacab2f157bc5a9324f7bd7f77 build by 
mockbuild at builder10.centos.org, 2012-05-07 11:56:36

      1: cs:Connected ro:Primary/Primary ds:UpToDate/UpToDate C r-----
         ns:81 nr:407832 dw:106657970 dr:266340 al:179 bm:6551 lo:0 pe:0 
ua:0 ap:0 ep:1 wo:b oos:0

until I try to mount the volume:

     mount -t ocfs2 /dev/drbd1 /data/webroot/
     mount.ocfs2: Transport endpoint is not connected while mounting 
/dev/drbd1 on /data/webroot/. Check 'dmesg' for more information on this 
error.

`/var/log/kern.log`

     kernel: (o2net,11427,1):o2net_connect_expired:1664 ERROR: no 
connection established with node 0 after 30.0 seconds, giving up and 
returning errors.
     kernel: (mount.ocfs2,12037,1):dlm_request_join:1036 ERROR: status = 
-107
     kernel: (mount.ocfs2,12037,1):dlm_try_to_join_domain:1210 ERROR: 
status = -107
     kernel: (mount.ocfs2,12037,1):dlm_join_domain:1488 ERROR: status = -107
     kernel: (mount.ocfs2,12037,1):dlm_register_domain:1754 ERROR: 
status = -107
     kernel: (mount.ocfs2,12037,1):ocfs2_dlm_init:2808 ERROR: status = -107
     kernel: (mount.ocfs2,12037,1):ocfs2_mount_volume:1447 ERROR: status 
= -107
     kernel: ocfs2: Unmounting device (147,1) on (node 1)

I'm sure `/etc/ocfs2/cluster.conf` on the both node are identical:

`/etc/ocfs2/cluster.conf`

     node:
         ip_port = 7777
         ip_address = 192.168.3.145
         number = 0
         name = SVR233NTC-3145.localdomain
         cluster = cpc

     node:
         ip_port = 7777
         ip_address = 192.168.2.93
         number = 1
         name = SVR022-293.localdomain
         cluster = cpc

     cluster:
         node_count = 2
         name = cpc

and they are connected fine:

     # nc -z 192.168.3.145 7777
     Connection to 192.168.3.145 7777 port [tcp/cbt] succeeded!

but the O2CB heartbeat is not active on the new node (192.168.2.93):

`/etc/init.d/o2cb status`

     Driver for "configfs": Loaded
     Filesystem "configfs": Mounted
     Driver for "ocfs2_dlmfs": Loaded
     Filesystem "ocfs2_dlmfs": Mounted
     Checking O2CB cluster cpc: Online
     Heartbeat dead threshold = 31
       Network idle timeout: 30000
       Network keepalive delay: 2000
       Network reconnect delay: 2000
     Checking O2CB heartbeat: Not active

Here're the results when running `tcpdump` on the node 0 while starting 
the `ocfs2` on the node 1:

       1   0.000000 192.168.2.93 -> 192.168.3.145 TCP 70 55274 > cbt 
[SYN] Seq=0 Win=5840 Len=0 MSS=1460 TSval=690432180 TSecr=0
       2   0.000008 192.168.3.145 -> 192.168.2.93 TCP 70 cbt > 55274 
[SYN, ACK] Seq=0 Ack=1 Win=5792 Len=0 MSS=1460 TSval=707657223 
TSecr=690432180
       3   0.000223 192.168.2.93 -> 192.168.3.145 TCP 66 55274 > cbt 
[ACK] Seq=1 Ack=1 Win=5840 Len=0 TSval=690432181 TSecr=707657223
       4   0.000286 192.168.2.93 -> 192.168.3.145 TCP 98 55274 > cbt 
[PSH, ACK] Seq=1 Ack=1 Win=5840 Len=32 TSval=690432181 TSecr=707657223
       5   0.000292 192.168.3.145 -> 192.168.2.93 TCP 66 cbt > 55274 
[ACK] Seq=1 Ack=33 Win=5792 Len=0 TSval=707657223 TSecr=690432181
       6   0.000324 192.168.3.145 -> 192.168.2.93 TCP 66 cbt > 55274 
[RST, ACK] Seq=1 Ack=33 Win=5792 Len=0 TSval=707657223 TSecr=690432181

The `RST` flag is sent after every 6 packets.

What other can I do to debug this case?

PS:

OCFS2 versions on the node 0:

  - ocfs2-tools-1.4.4-1.el5
  - ocfs2-2.6.18-274.12.1.el5-1.4.7-1.el5

OCFS2 versions on the node 1:

  - ocfs2-tools-1.4.4-1.el5
  - ocfs2-2.6.18-308.el5-1.4.7-1.el5

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20121222/e2493dec/attachment.html 


More information about the Ocfs2-users mailing list