[Ocfs2-users] Transport endpoint not connected after crash of one node

Sebastian Reitenbach sebastia at l00-bugdead-prods.de
Thu Aug 23 04:26:59 PDT 2007


Hi,

I am on SLES 10, SP1, x86_64, running the distribution rpm's of ocfs:
ocfs2console-1.2.3-0.7
ocfs2-tools-1.2.3-0.7

I have a two node ocfs2 cluster configured. One node died (manual reset), 
and the second started immediately to have problems on accessing the file 
system for the following reason from the logs: Transport endpoint not 
connected.

a mounted.ocfs2 on the still living machine showed that both machines have 
the filesystems mounted. After a umount of all the filesystems, the second 
node still thought that it had mounted some of the ocfs2 partitions:


ppsdb101:~ # mounted.ocfs2 -f
Device                FS     Nodes
/dev/sda1             ocfs2  ppsdb102
/dev/sdb1             ocfs2  ppsdb102
/dev/sdc1             ocfs2  ppsdb102
/dev/sdd1             ocfs2  ppsdb102
/dev/sde1             ocfs2  ppsdb102
/dev/sdf1             ocfs2  ppsdb102
/dev/sdg1             ocfs2  ppsdb102
/dev/sdh1             ocfs2  ppsdb102
/dev/sdi1             ocfs2  ppsdb102
/dev/sdj1             ocfs2  ppsdb102
/dev/sdk1             ocfs2  ppsdb102
/dev/sdl1             ocfs2  ppsdb102, ppsdb101
/dev/sdm1             ocfs2  ppsdb102
/dev/sdn1             ocfs2  ppsdb102
/dev/sdo1             ocfs2  ppsdb102
/dev/sdp1             ocfs2  ppsdb102, ppsdb101
/dev/sdq1             ocfs2  ppsdb102, ppsdb101
/dev/sdr1             ocfs2  ppsdb102, ppsdb101
/dev/sds1             ocfs2  ppsdb102, ppsdb101
/dev/sdt1             ocfs2  ppsdb102
/dev/sdu1             ocfs2  ppsdb102

in the above case, the ppsdb102 is the dead machine, the ppsdb101 is the one 
that is still alive. An ordinary mount command shows that there are none of 
the above listed partitions mounted, but mounted.ocfs2 still thinks that 
some of them are mounted.

o2cb configure was configured like this:
Load O2CB driver on boot (y/n) [y]:
Cluster to start on boot (Enter "none" to clear) [ppscluster]:
Specify heartbeat dead threshold (>=7) [61]:
Use user-space driven heartbeat? (y/n) [n]:
Cluster keepalive delay (ms) [5000]:
Cluster reconnect dealy (ms) [2000]:
Cluster idle timeout (ms) [10000]:
Writing O2CB configuration: OK
O2CB cluster ppscluster already online


Two questions:
1. shouldn't the still living machine recognize the dead of the other node 
after 61 seconds=
2. shouldn't mounted.ocfs2 show the same locally mounted ocfs2 partitions as 
mount -t ocfs2 does?

kind regards
Sebastian









More information about the Ocfs2-users mailing list