[Ocfs2-users] 2 node cluster with shared LUN via FC

Manuel Bogner manuel.bogner at geizhals.at
Thu Nov 4 07:14:51 PDT 2010


Hi Sérgio,

thanks for your quick answere.

There are such lines after waiting a little bit, but still the same
behavior.

[ 2063.720211] o2net: connected to node xen02a (num 0) at 10.0.0.168:7777

[ 1979.611076] o2net: accepted connection from node xen02b (num 1) at
10.0.0.102:7777


xen02a:~# lsmod | egrep 'jbd|ocfs2|configfs'
ocfs2                 395816  1
ocfs2_dlmfs            23696  1
ocfs2_stack_o2cb        9088  1
ocfs2_dlm             197824  2 ocfs2_dlmfs,ocfs2_stack_o2cb
ocfs2_nodemanager     208744  8 ocfs2,ocfs2_dlmfs,ocfs2_stack_o2cb,ocfs2_dlm
ocfs2_stackglue        16432  2 ocfs2,ocfs2_stack_o2cb
configfs               29736  2 ocfs2_nodemanager
jbd                    54696  2 ocfs2,ext3

xen02a:~# netstat -an | grep 7777
tcp        0      0 10.0.0.168:7777         0.0.0.0:*
LISTEN
tcp        0      0 10.0.0.168:7777         10.0.0.102:47547
ESTABLISHED

xen02b:~# lsmod | egrep 'jbd|ocfs2|configfs'
ocfs2                 395816  1
ocfs2_dlmfs            23696  1
ocfs2_stack_o2cb        9088  1
ocfs2_dlm             197824  2 ocfs2_dlmfs,ocfs2_stack_o2cb
ocfs2_nodemanager     208744  8 ocfs2,ocfs2_dlmfs,ocfs2_stack_o2cb,ocfs2_dlm
ocfs2_stackglue        16432  2 ocfs2,ocfs2_stack_o2cb
configfs               29736  2 ocfs2_nodemanager
jbd                    54696  2 ocfs2,ext3

xen02b:~# netstat -an | grep 7777
tcp        0      0 10.0.0.102:7777         0.0.0.0:*
LISTEN
tcp        0      0 10.0.0.102:47547        10.0.0.168:7777
ESTABLISHED

There are no iptables-entries on both nodes as they are just test-servers.

xen02a:~# uname -a
Linux xen02a 2.6.26-2-xen-amd64 #1 SMP Thu Sep 16 16:32:15 UTC 2010
x86_64 GNU/Linux

xen02b:~# uname -a
Linux xen02b 2.6.26-2-xen-amd64 #1 SMP Thu Sep 16 16:32:15 UTC 2010
x86_64 GNU/Linux

xen02b:~# cat /etc/default/o2cb
#
# This is a configuration file for automatic startup of the O2CB
# driver.  It is generated by running /etc/init.d/o2cb configure.
# On Debian based systems the preferred method is running
# 'dpkg-reconfigure ocfs2-tools'.
#

# O2CB_ENABLED: 'true' means to load the driver on boot.
O2CB_ENABLED=true

# O2CB_STACK: The name of the cluster stack backing O2CB.
O2CB_STACK=o2cb

# O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start.
O2CB_BOOTCLUSTER=ocfs2

# O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead.
O2CB_HEARTBEAT_THRESHOLD=31

# O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is
considered dead.
O2CB_IDLE_TIMEOUT_MS=30000

# O2CB_KEEPALIVE_DELAY_MS: Max time in ms before a keepalive packet is sent
O2CB_KEEPALIVE_DELAY_MS=2000

# O2CB_RECONNECT_DELAY_MS: Min time in ms between connection attempts
O2CB_RECONNECT_DELAY_MS=2000


xen02b:~# mount
/dev/sda1 on / type ext3 (rw,errors=remount-ro)
tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
procbususb on /proc/bus/usb type usbfs (rw)
udev on /dev type tmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
configfs on /sys/kernel/config type configfs (rw)
ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw)
/dev/sdd1 on /shared type ocfs2 (rw,_netdev,heartbeat=local)


regards,
Manuel

Am 2010-11-04 15:03, schrieb Sérgio Surkamp:
> It seems that the o2net (network stack) is not running as you should
> see the network messages in dmesg. Something like:
> 
> xen02a kernel: o2net: connected to node xen02b (num 0) at
> 10.0.0.102:7777
> 
> Check your firewall and network configurations, also check if [o2net]
> kernel thread is running and the tcp port 7777 is listening in both
> nodes. If the thread is not running, check if you have all needed
> kernel modules loaded:
> 
> ocfs2
> jbd
> ocfs2_dlm
> ocfs2_dlmfs
> ocfs2_nodemanager
> configfs
> 
> Regards,
> Sérgio
> 
> Em Thu, 04 Nov 2010 14:12:11 +0100
> Manuel Bogner <manuel.bogner at geizhals.at> escreveu:
> 
>> sorry for the repost, but just saw that i mixed german and english...
>> here is the corrected version:
>>
>>
>>
>> Hi,
>>
>> I'm trying to create a cluster out of 2 nodes. Both systems share the
>> same LUN via FC and see it as /dev/sdd.
>>
>> /dev/sdd has one partition
>>
>> Disk /dev/sdd: 21.4 GB, 21474836480 bytes
>> 64 heads, 32 sectors/track, 20480 cylinders
>> Units = cylinders of 2048 * 512 = 1048576 bytes
>> Disk identifier: 0xc29cb93d
>>
>>    Device Boot      Start         End      Blocks   Id  System
>> /dev/sdd1               1       20480    20971504   83  Linux
>>
>> which is formated with
>>
>>   mkfs.ocfs2 -L ocfs2 /dev/sdd1
>>
>>
>> Here is my /etc/ocfs2/cluster.conf
>>
>> node:
>>     ip_port = 7777
>>     ip_address = 10.0.0.168
>>     number = 0
>>     name = xen02a
>>     cluster = ocfs2
>>
>> node:
>>     ip_port = 7777
>>     ip_address = 10.0.0.102
>>     number = 1
>>     name = xen02b
>>     cluster = ocfs2
>>
>> cluster:
>>     node_count = 2
>>     name = ocfs2
>>
>>
>> Everything seems to be fine:
>>
>> xen02a:~# /etc/init.d/o2cb status
>> Driver for "configfs": Loaded
>> Filesystem "configfs": Mounted
>> Stack glue driver: Loaded
>> Stack plugin "o2cb": Loaded
>> Driver for "ocfs2_dlmfs": Loaded
>> Filesystem "ocfs2_dlmfs": Mounted
>> Checking O2CB cluster ocfs2: Online
>> Heartbeat dead threshold = 31
>>   Network idle timeout: 30000
>>   Network keepalive delay: 2000
>>   Network reconnect delay: 2000
>> Checking O2CB heartbeat: Active
>>
>> And mounting the fs on each node works fine:
>>
>> /dev/sdd1 on /shared type ocfs2 (rw,_netdev,heartbeat=local)
>>
>> Both nodes can ping each other.
>>
>>
>> xen02a:~# mounted.ocfs2 -d
>> Device                FS     UUID
>> Label /dev/sdd1             ocfs2
>> 55a9d0b0-050c-484f-9725-7788a3b9dde0  ocfs2
>>
>> xen02b:~# mounted.ocfs2 -d
>> Device                FS     UUID
>> Label /dev/sdd1             ocfs2
>> 55a9d0b0-050c-484f-9725-7788a3b9dde0  ocfs2
>>
>>
>> Now the problem:
>>
>> I first mount the device on node1:
>>
>>  xen02a:~# mount -L ocfs2 /shared/
>> => /dev/sdd1 on /shared type ocfs2 (rw,_netdev,heartbeat=local)
>> without any errors.
>>
>> dmesg says:
>>
>> [   97.244054] ocfs2_dlm: Nodes in domain
>> ("55A9D0B0050C484F97257788A3B9DDE0"): 0
>> [   97.245869] kjournald starting.  Commit interval 5 seconds
>> [   97.247045] ocfs2: Mounting device (8,49) on (node 0, slot 0) with
>> ordered data mode.
>>
>> xen02a:~# mounted.ocfs2 -f
>> Device                FS     Nodes
>> /dev/sdd1             ocfs2  xen02a
>>
>> xen02a:~# echo "slotmap" | debugfs.ocfs2 -n /dev/sdd1
>> 	Slot#   Node#
>> 	    0       0
>>
>>
>> Now I mount the device on the second node:
>>
>> xen02b:~# mount -L ocfs2 /shared/
>> => /dev/sdd1 on /shared type ocfs2 (rw,_netdev,heartbeat=local)
>>
>> [  269.741168] OCFS2 1.5.0
>> [  269.765171] ocfs2_dlm: Nodes in domain
>> ("55A9D0B0050C484F97257788A3B9DDE0"): 1
>> [  269.779620] kjournald starting.  Commit interval 5 seconds
>> [  269.779620] ocfs2: Mounting device (8,49) on (node 1, slot 1) with
>> ordered data mode.
>> [  269.779620] (2953,0):ocfs2_replay_journal:1149 Recovering node 0
>> from slot 0 on device (8,49)
>> [  270.950540] kjournald starting.  Commit interval 5 seconds
>>
>> xen02b:~# echo "slotmap" | debugfs.ocfs2 -n /dev/sdd1
>> 	Slot#   Node#
>> 	    1       1
>>
>> xen02b:~# mounted.ocfs2 -f
>> Device                FS     Nodes
>> /dev/sdd1             ocfs2  xen02b
>>
>>
>> So the first mount seems to be gone and any changes on the fs on that
>> node are not distributed.
>>
>> At the moment I have no idea what this could be. I hope someone can
>> help me.
>>
>> regards,
>> Manuel
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
> 
> 



More information about the Ocfs2-users mailing list