[Ocfs2-users] 2 node cluster with shared LUN via FC
Manuel Bogner
manuel.bogner at geizhals.at
Thu Nov 4 07:14:51 PDT 2010
Hi Sérgio,
thanks for your quick answere.
There are such lines after waiting a little bit, but still the same
behavior.
[ 2063.720211] o2net: connected to node xen02a (num 0) at 10.0.0.168:7777
[ 1979.611076] o2net: accepted connection from node xen02b (num 1) at
10.0.0.102:7777
xen02a:~# lsmod | egrep 'jbd|ocfs2|configfs'
ocfs2 395816 1
ocfs2_dlmfs 23696 1
ocfs2_stack_o2cb 9088 1
ocfs2_dlm 197824 2 ocfs2_dlmfs,ocfs2_stack_o2cb
ocfs2_nodemanager 208744 8 ocfs2,ocfs2_dlmfs,ocfs2_stack_o2cb,ocfs2_dlm
ocfs2_stackglue 16432 2 ocfs2,ocfs2_stack_o2cb
configfs 29736 2 ocfs2_nodemanager
jbd 54696 2 ocfs2,ext3
xen02a:~# netstat -an | grep 7777
tcp 0 0 10.0.0.168:7777 0.0.0.0:*
LISTEN
tcp 0 0 10.0.0.168:7777 10.0.0.102:47547
ESTABLISHED
xen02b:~# lsmod | egrep 'jbd|ocfs2|configfs'
ocfs2 395816 1
ocfs2_dlmfs 23696 1
ocfs2_stack_o2cb 9088 1
ocfs2_dlm 197824 2 ocfs2_dlmfs,ocfs2_stack_o2cb
ocfs2_nodemanager 208744 8 ocfs2,ocfs2_dlmfs,ocfs2_stack_o2cb,ocfs2_dlm
ocfs2_stackglue 16432 2 ocfs2,ocfs2_stack_o2cb
configfs 29736 2 ocfs2_nodemanager
jbd 54696 2 ocfs2,ext3
xen02b:~# netstat -an | grep 7777
tcp 0 0 10.0.0.102:7777 0.0.0.0:*
LISTEN
tcp 0 0 10.0.0.102:47547 10.0.0.168:7777
ESTABLISHED
There are no iptables-entries on both nodes as they are just test-servers.
xen02a:~# uname -a
Linux xen02a 2.6.26-2-xen-amd64 #1 SMP Thu Sep 16 16:32:15 UTC 2010
x86_64 GNU/Linux
xen02b:~# uname -a
Linux xen02b 2.6.26-2-xen-amd64 #1 SMP Thu Sep 16 16:32:15 UTC 2010
x86_64 GNU/Linux
xen02b:~# cat /etc/default/o2cb
#
# This is a configuration file for automatic startup of the O2CB
# driver. It is generated by running /etc/init.d/o2cb configure.
# On Debian based systems the preferred method is running
# 'dpkg-reconfigure ocfs2-tools'.
#
# O2CB_ENABLED: 'true' means to load the driver on boot.
O2CB_ENABLED=true
# O2CB_STACK: The name of the cluster stack backing O2CB.
O2CB_STACK=o2cb
# O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start.
O2CB_BOOTCLUSTER=ocfs2
# O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead.
O2CB_HEARTBEAT_THRESHOLD=31
# O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is
considered dead.
O2CB_IDLE_TIMEOUT_MS=30000
# O2CB_KEEPALIVE_DELAY_MS: Max time in ms before a keepalive packet is sent
O2CB_KEEPALIVE_DELAY_MS=2000
# O2CB_RECONNECT_DELAY_MS: Min time in ms between connection attempts
O2CB_RECONNECT_DELAY_MS=2000
xen02b:~# mount
/dev/sda1 on / type ext3 (rw,errors=remount-ro)
tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
procbususb on /proc/bus/usb type usbfs (rw)
udev on /dev type tmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
configfs on /sys/kernel/config type configfs (rw)
ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw)
/dev/sdd1 on /shared type ocfs2 (rw,_netdev,heartbeat=local)
regards,
Manuel
Am 2010-11-04 15:03, schrieb Sérgio Surkamp:
> It seems that the o2net (network stack) is not running as you should
> see the network messages in dmesg. Something like:
>
> xen02a kernel: o2net: connected to node xen02b (num 0) at
> 10.0.0.102:7777
>
> Check your firewall and network configurations, also check if [o2net]
> kernel thread is running and the tcp port 7777 is listening in both
> nodes. If the thread is not running, check if you have all needed
> kernel modules loaded:
>
> ocfs2
> jbd
> ocfs2_dlm
> ocfs2_dlmfs
> ocfs2_nodemanager
> configfs
>
> Regards,
> Sérgio
>
> Em Thu, 04 Nov 2010 14:12:11 +0100
> Manuel Bogner <manuel.bogner at geizhals.at> escreveu:
>
>> sorry for the repost, but just saw that i mixed german and english...
>> here is the corrected version:
>>
>>
>>
>> Hi,
>>
>> I'm trying to create a cluster out of 2 nodes. Both systems share the
>> same LUN via FC and see it as /dev/sdd.
>>
>> /dev/sdd has one partition
>>
>> Disk /dev/sdd: 21.4 GB, 21474836480 bytes
>> 64 heads, 32 sectors/track, 20480 cylinders
>> Units = cylinders of 2048 * 512 = 1048576 bytes
>> Disk identifier: 0xc29cb93d
>>
>> Device Boot Start End Blocks Id System
>> /dev/sdd1 1 20480 20971504 83 Linux
>>
>> which is formated with
>>
>> mkfs.ocfs2 -L ocfs2 /dev/sdd1
>>
>>
>> Here is my /etc/ocfs2/cluster.conf
>>
>> node:
>> ip_port = 7777
>> ip_address = 10.0.0.168
>> number = 0
>> name = xen02a
>> cluster = ocfs2
>>
>> node:
>> ip_port = 7777
>> ip_address = 10.0.0.102
>> number = 1
>> name = xen02b
>> cluster = ocfs2
>>
>> cluster:
>> node_count = 2
>> name = ocfs2
>>
>>
>> Everything seems to be fine:
>>
>> xen02a:~# /etc/init.d/o2cb status
>> Driver for "configfs": Loaded
>> Filesystem "configfs": Mounted
>> Stack glue driver: Loaded
>> Stack plugin "o2cb": Loaded
>> Driver for "ocfs2_dlmfs": Loaded
>> Filesystem "ocfs2_dlmfs": Mounted
>> Checking O2CB cluster ocfs2: Online
>> Heartbeat dead threshold = 31
>> Network idle timeout: 30000
>> Network keepalive delay: 2000
>> Network reconnect delay: 2000
>> Checking O2CB heartbeat: Active
>>
>> And mounting the fs on each node works fine:
>>
>> /dev/sdd1 on /shared type ocfs2 (rw,_netdev,heartbeat=local)
>>
>> Both nodes can ping each other.
>>
>>
>> xen02a:~# mounted.ocfs2 -d
>> Device FS UUID
>> Label /dev/sdd1 ocfs2
>> 55a9d0b0-050c-484f-9725-7788a3b9dde0 ocfs2
>>
>> xen02b:~# mounted.ocfs2 -d
>> Device FS UUID
>> Label /dev/sdd1 ocfs2
>> 55a9d0b0-050c-484f-9725-7788a3b9dde0 ocfs2
>>
>>
>> Now the problem:
>>
>> I first mount the device on node1:
>>
>> xen02a:~# mount -L ocfs2 /shared/
>> => /dev/sdd1 on /shared type ocfs2 (rw,_netdev,heartbeat=local)
>> without any errors.
>>
>> dmesg says:
>>
>> [ 97.244054] ocfs2_dlm: Nodes in domain
>> ("55A9D0B0050C484F97257788A3B9DDE0"): 0
>> [ 97.245869] kjournald starting. Commit interval 5 seconds
>> [ 97.247045] ocfs2: Mounting device (8,49) on (node 0, slot 0) with
>> ordered data mode.
>>
>> xen02a:~# mounted.ocfs2 -f
>> Device FS Nodes
>> /dev/sdd1 ocfs2 xen02a
>>
>> xen02a:~# echo "slotmap" | debugfs.ocfs2 -n /dev/sdd1
>> Slot# Node#
>> 0 0
>>
>>
>> Now I mount the device on the second node:
>>
>> xen02b:~# mount -L ocfs2 /shared/
>> => /dev/sdd1 on /shared type ocfs2 (rw,_netdev,heartbeat=local)
>>
>> [ 269.741168] OCFS2 1.5.0
>> [ 269.765171] ocfs2_dlm: Nodes in domain
>> ("55A9D0B0050C484F97257788A3B9DDE0"): 1
>> [ 269.779620] kjournald starting. Commit interval 5 seconds
>> [ 269.779620] ocfs2: Mounting device (8,49) on (node 1, slot 1) with
>> ordered data mode.
>> [ 269.779620] (2953,0):ocfs2_replay_journal:1149 Recovering node 0
>> from slot 0 on device (8,49)
>> [ 270.950540] kjournald starting. Commit interval 5 seconds
>>
>> xen02b:~# echo "slotmap" | debugfs.ocfs2 -n /dev/sdd1
>> Slot# Node#
>> 1 1
>>
>> xen02b:~# mounted.ocfs2 -f
>> Device FS Nodes
>> /dev/sdd1 ocfs2 xen02b
>>
>>
>> So the first mount seems to be gone and any changes on the fs on that
>> node are not distributed.
>>
>> At the moment I have no idea what this could be. I hope someone can
>> help me.
>>
>> regards,
>> Manuel
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>
>
More information about the Ocfs2-users
mailing list