[Ocfs2-users] one node rejects connection from new node

Sunil Mushran sunil.mushran at oracle.com
Mon Feb 2 10:05:02 PST 2009


The o2cb_ctl command should have added the new node to
the cluster.conf and configfs (/sys/kernel/config). If wilson1 is not
recognizing the new node, something went wrong in adding it to
configfs.

Do: ls -lR /sys/kernel/config/cluster. The contents should be the
same on all nodes. What does it say on wilson1?

Carl J. Benson wrote:
> Sunil,
>
> I read the user's guide, and added node "gladstone" by entering
> the following command, as root on each of the four nodes:
>
> o2cb_ctl -C -i -n gladstone -t node -a number=3 -a
> ip_address=140.107.170.108 -a ip_port=7778 -a cluster=ocfs2
>
> I copied/pasted the command, so it was identical on all nodes.
>
> On gladstone, /etc/init.d/o2cb status shows:
>
> Driver for "configfs": Loaded
> Filesystem "configfs": Mounted
> Stack glue driver: Loaded
> Stack plugin "o2cb": Loaded
> Driver for "ocfs2_dlmfs": Loaded
> Filesystem "ocfs2_dlmfs": Mounted
> Checking O2CB cluster ocfs2: Online
> Heartbeat dead threshold = 31
>   Network idle timeout: 30000
>   Network keepalive delay: 2000
>   Network reconnect delay: 2000
> Checking O2CB heartbeat: Not active
>
> So I attempt to mount the filesystem with mount /mnt/cpb_clust.
>
> Merlot1 likes it:
> Feb  2 09:40:33 merlot1 kernel: o2net: accepted connection from node
> gladstone (num 3) at 140.107.170.108:7778
>
> Merlot2 likes it:
> Feb  2 09:40:33 merlot2 kernel: o2net: accepted connection from node
> gladstone (num 3) at 140.107.170.108:7777
>
> But wilson1 does not:
> Feb  2 09:40:33 wilson1 kernel: (4447,3):o2net_accept_one:1795 attempt
> to connect from unknown node at 140.107.170.108:35267
> <...>
> Feb  2 09:41:00 wilson1 kernel: (4447,3):o2net_connect_expired:1659
> ERROR: no connection established with node 3 after 30.0 seconds, giving
> up and returning errors.
>
> On the new node, gladstone, I see:
> Feb  2 09:40:33 gladstone kernel: o2net: connected to node merlot2 (num
> 1) at 140.107.158.54:7777
> Feb  2 09:40:33 gladstone kernel: o2net: connected to node merlot1 (num
> 0) at 140.107.170.116:7777
> Feb  2 09:41:03 gladstone kernel: (7347,2):o2net_connect_expired:1659
> ERROR: noconnection established with node 2 after 30.0 seconds, giving
> up and returning errors.
> Feb  2 09:41:03 gladstone kernel: (24118,2):dlm_request_join:1033 ERROR:
> status= -107
> Feb  2 09:41:03 gladstone kernel: (24118,2):dlm_try_to_join_domain:1207
> ERROR: status = -107
> Feb  2 09:41:03 gladstone kernel: (24118,2):dlm_join_domain:1485 ERROR:
> status = -107
> Feb  2 09:41:03 gladstone kernel: (24118,2):dlm_register_domain:1732
> ERROR: status = -107
> Feb  2 09:41:03 gladstone kernel: (24118,2):o2cb_cluster_connect:302
> ERROR: status = -107
> Feb  2 09:41:03 gladstone kernel: (24118,2):ocfs2_dlm_init:2786 ERROR:
> status =-107
> Feb  2 09:41:03 gladstone kernel: (24118,2):ocfs2_mount_volume:1560
> ERROR: status = -107
> Feb  2 09:41:03 gladstone kernel: ocfs2: Unmounting device (8,17) on
> (node 0)
> Feb  2 09:41:03 gladstone kernel: o2net: no longer connected to node
> merlot1 (num 0) at 140.107.170.116:7777
> Feb  2 09:41:03 gladstone kernel: o2net: no longer connected to node
> merlot2 (num 1) at 140.107.158.54:7777
>
> Can you help me figure out where the problem is?
>
>   




More information about the Ocfs2-users mailing list