[Ocfs2-users] Transport endpoint not connected after crash of
one node
Sunil Mushran
Sunil.Mushran at oracle.com
Fri Aug 24 14:01:45 PDT 2007
You could be encountering Novell bugzilla 296606. It is specific
to SLES10 (and SP1). Novell owns the bug.
Sebastian Reitenbach wrote:
> Hi,
>
> I am on SLES 10, SP1, x86_64, running the distribution rpm's of ocfs:
> ocfs2console-1.2.3-0.7
> ocfs2-tools-1.2.3-0.7
>
> I have a two node ocfs2 cluster configured. One node died (manual reset),
> and the second started immediately to have problems on accessing the file
> system for the following reason from the logs: Transport endpoint not
> connected.
>
> a mounted.ocfs2 on the still living machine showed that both machines have
> the filesystems mounted. After a umount of all the filesystems, the second
> node still thought that it had mounted some of the ocfs2 partitions:
>
>
> ppsdb101:~ # mounted.ocfs2 -f
> Device FS Nodes
> /dev/sda1 ocfs2 ppsdb102
> /dev/sdb1 ocfs2 ppsdb102
> /dev/sdc1 ocfs2 ppsdb102
> /dev/sdd1 ocfs2 ppsdb102
> /dev/sde1 ocfs2 ppsdb102
> /dev/sdf1 ocfs2 ppsdb102
> /dev/sdg1 ocfs2 ppsdb102
> /dev/sdh1 ocfs2 ppsdb102
> /dev/sdi1 ocfs2 ppsdb102
> /dev/sdj1 ocfs2 ppsdb102
> /dev/sdk1 ocfs2 ppsdb102
> /dev/sdl1 ocfs2 ppsdb102, ppsdb101
> /dev/sdm1 ocfs2 ppsdb102
> /dev/sdn1 ocfs2 ppsdb102
> /dev/sdo1 ocfs2 ppsdb102
> /dev/sdp1 ocfs2 ppsdb102, ppsdb101
> /dev/sdq1 ocfs2 ppsdb102, ppsdb101
> /dev/sdr1 ocfs2 ppsdb102, ppsdb101
> /dev/sds1 ocfs2 ppsdb102, ppsdb101
> /dev/sdt1 ocfs2 ppsdb102
> /dev/sdu1 ocfs2 ppsdb102
>
> in the above case, the ppsdb102 is the dead machine, the ppsdb101 is the one
> that is still alive. An ordinary mount command shows that there are none of
> the above listed partitions mounted, but mounted.ocfs2 still thinks that
> some of them are mounted.
>
> o2cb configure was configured like this:
> Load O2CB driver on boot (y/n) [y]:
> Cluster to start on boot (Enter "none" to clear) [ppscluster]:
> Specify heartbeat dead threshold (>=7) [61]:
> Use user-space driven heartbeat? (y/n) [n]:
> Cluster keepalive delay (ms) [5000]:
> Cluster reconnect dealy (ms) [2000]:
> Cluster idle timeout (ms) [10000]:
> Writing O2CB configuration: OK
> O2CB cluster ppscluster already online
>
>
> Two questions:
> 1. shouldn't the still living machine recognize the dead of the other node
> after 61 seconds=
> 2. shouldn't mounted.ocfs2 show the same locally mounted ocfs2 partitions as
> mount -t ocfs2 does?
>
> kind regards
> Sebastian
>
>
>
>
>
>
>
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>
More information about the Ocfs2-users
mailing list