[Ocfs2-users] 2 node cluster with shared LUN via FC

Thu Nov 4 07:49:02 PDT 2010

Hi,

this could also be interesting. I tried mount /dev/sdd1 /shared/ on both
nodes at the same time with the following log result:

[  331.158166] OCFS2 1.5.0
[  336.155577] ocfs2_dlm: Nodes in domain
("55A9D0B0050C484F97257788A3B9DDE0"): 0
[  336.166327] kjournald starting.  Commit interval 5 seconds
[  336.166327] ocfs2: Mounting device (8,49) on (node 0, slot 1) with
ordered data mode.
[  336.166664] (3239,0):ocfs2_replay_journal:1149 Recovering node 1 from
slot 0 on device (8,49)
[  337.350942] kjournald starting.  Commit interval 5 seconds
[  351.142229] o2net: accepted connection from node xen02b (num 1) at
10.0.0.102:7777
[  495.059065] o2net: no longer connected to node xen02b (num 1) at
10.0.0.102:7777

[ 4841.036991] ocfs2_dlm: Nodes in domain
("55A9D0B0050C484F97257788A3B9DDE0"): 1
[ 4841.039225] kjournald starting.  Commit interval 5 seconds
[ 4841.039997] ocfs2: Mounting device (8,49) on (node 1, slot 0) with
ordered data mode.
[ 4862.033837] o2net: connected to node xen02a (num 0) at 10.0.0.168:7777
[ 5005.996422] o2net: no longer connected to node xen02a (num 0) at
10.0.0.168:7777
[ 5005.998393] ocfs2: Unmounting device (8,49) on (node 1)

at the end xen02a was the only one that had it mounted.

regards,
Manuel

Am 2010-11-04 15:14, schrieb Manuel Bogner:
> Hi Sérgio,
> 
> thanks for your quick answere.
> 
> There are such lines after waiting a little bit, but still the same
> behavior.
> 
> [ 2063.720211] o2net: connected to node xen02a (num 0) at 10.0.0.168:7777
> 
> [ 1979.611076] o2net: accepted connection from node xen02b (num 1) at
> 10.0.0.102:7777
> 
> 
> xen02a:~# lsmod | egrep 'jbd|ocfs2|configfs'
> ocfs2                 395816  1
> ocfs2_dlmfs            23696  1
> ocfs2_stack_o2cb        9088  1
> ocfs2_dlm             197824  2 ocfs2_dlmfs,ocfs2_stack_o2cb
> ocfs2_nodemanager     208744  8 ocfs2,ocfs2_dlmfs,ocfs2_stack_o2cb,ocfs2_dlm
> ocfs2_stackglue        16432  2 ocfs2,ocfs2_stack_o2cb
> configfs               29736  2 ocfs2_nodemanager
> jbd                    54696  2 ocfs2,ext3
> 
> xen02a:~# netstat -an | grep 7777
> tcp        0      0 10.0.0.168:7777         0.0.0.0:*
> LISTEN
> tcp        0      0 10.0.0.168:7777         10.0.0.102:47547
> ESTABLISHED
> 
> xen02b:~# lsmod | egrep 'jbd|ocfs2|configfs'
> ocfs2                 395816  1
> ocfs2_dlmfs            23696  1
> ocfs2_stack_o2cb        9088  1
> ocfs2_dlm             197824  2 ocfs2_dlmfs,ocfs2_stack_o2cb
> ocfs2_nodemanager     208744  8 ocfs2,ocfs2_dlmfs,ocfs2_stack_o2cb,ocfs2_dlm
> ocfs2_stackglue        16432  2 ocfs2,ocfs2_stack_o2cb
> configfs               29736  2 ocfs2_nodemanager
> jbd                    54696  2 ocfs2,ext3
> 
> xen02b:~# netstat -an | grep 7777
> tcp        0      0 10.0.0.102:7777         0.0.0.0:*
> LISTEN
> tcp        0      0 10.0.0.102:47547        10.0.0.168:7777
> ESTABLISHED
> 
> There are no iptables-entries on both nodes as they are just test-servers.
> 
> xen02a:~# uname -a
> Linux xen02a 2.6.26-2-xen-amd64 #1 SMP Thu Sep 16 16:32:15 UTC 2010
> x86_64 GNU/Linux
> 
> xen02b:~# uname -a
> Linux xen02b 2.6.26-2-xen-amd64 #1 SMP Thu Sep 16 16:32:15 UTC 2010
> x86_64 GNU/Linux
> 
> xen02b:~# cat /etc/default/o2cb
> #
> # This is a configuration file for automatic startup of the O2CB
> # driver.  It is generated by running /etc/init.d/o2cb configure.
> # On Debian based systems the preferred method is running
> # 'dpkg-reconfigure ocfs2-tools'.
> #
> 
> # O2CB_ENABLED: 'true' means to load the driver on boot.
> O2CB_ENABLED=true
> 
> # O2CB_STACK: The name of the cluster stack backing O2CB.
> O2CB_STACK=o2cb
> 
> # O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start.
> O2CB_BOOTCLUSTER=ocfs2
> 
> # O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead.
> O2CB_HEARTBEAT_THRESHOLD=31
> 
> # O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is
> considered dead.
> O2CB_IDLE_TIMEOUT_MS=30000
> 
> # O2CB_KEEPALIVE_DELAY_MS: Max time in ms before a keepalive packet is sent
> O2CB_KEEPALIVE_DELAY_MS=2000
> 
> # O2CB_RECONNECT_DELAY_MS: Min time in ms between connection attempts
> O2CB_RECONNECT_DELAY_MS=2000
> 
> 
> xen02b:~# mount
> /dev/sda1 on / type ext3 (rw,errors=remount-ro)
> tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
> proc on /proc type proc (rw,noexec,nosuid,nodev)
> sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
> procbususb on /proc/bus/usb type usbfs (rw)
> udev on /dev type tmpfs (rw,mode=0755)
> tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
> devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
> configfs on /sys/kernel/config type configfs (rw)
> ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw)
> /dev/sdd1 on /shared type ocfs2 (rw,_netdev,heartbeat=local)
> 
> 
> regards,
> Manuel
> 
> Am 2010-11-04 15:03, schrieb Sérgio Surkamp:
>> It seems that the o2net (network stack) is not running as you should
>> see the network messages in dmesg. Something like:
>>
>> xen02a kernel: o2net: connected to node xen02b (num 0) at
>> 10.0.0.102:7777
>>
>> Check your firewall and network configurations, also check if [o2net]
>> kernel thread is running and the tcp port 7777 is listening in both
>> nodes. If the thread is not running, check if you have all needed
>> kernel modules loaded:
>>
>> ocfs2
>> jbd
>> ocfs2_dlm
>> ocfs2_dlmfs
>> ocfs2_nodemanager
>> configfs
>>
>> Regards,
>> Sérgio
>>
>> Em Thu, 04 Nov 2010 14:12:11 +0100
>> Manuel Bogner <manuel.bogner at geizhals.at> escreveu:
>>
>>> sorry for the repost, but just saw that i mixed german and english...
>>> here is the corrected version:
>>>
>>>
>>>
>>> Hi,
>>>
>>> I'm trying to create a cluster out of 2 nodes. Both systems share the
>>> same LUN via FC and see it as /dev/sdd.
>>>
>>> /dev/sdd has one partition
>>>
>>> Disk /dev/sdd: 21.4 GB, 21474836480 bytes
>>> 64 heads, 32 sectors/track, 20480 cylinders
>>> Units = cylinders of 2048 * 512 = 1048576 bytes
>>> Disk identifier: 0xc29cb93d
>>>
>>>    Device Boot      Start         End      Blocks   Id  System
>>> /dev/sdd1               1       20480    20971504   83  Linux
>>>
>>> which is formated with
>>>
>>>   mkfs.ocfs2 -L ocfs2 /dev/sdd1
>>>
>>>
>>> Here is my /etc/ocfs2/cluster.conf
>>>
>>> node:
>>>     ip_port = 7777
>>>     ip_address = 10.0.0.168
>>>     number = 0
>>>     name = xen02a
>>>     cluster = ocfs2
>>>
>>> node:
>>>     ip_port = 7777
>>>     ip_address = 10.0.0.102
>>>     number = 1
>>>     name = xen02b
>>>     cluster = ocfs2
>>>
>>> cluster:
>>>     node_count = 2
>>>     name = ocfs2
>>>
>>>
>>> Everything seems to be fine:
>>>
>>> xen02a:~# /etc/init.d/o2cb status
>>> Driver for "configfs": Loaded
>>> Filesystem "configfs": Mounted
>>> Stack glue driver: Loaded
>>> Stack plugin "o2cb": Loaded
>>> Driver for "ocfs2_dlmfs": Loaded
>>> Filesystem "ocfs2_dlmfs": Mounted
>>> Checking O2CB cluster ocfs2: Online
>>> Heartbeat dead threshold = 31
>>>   Network idle timeout: 30000
>>>   Network keepalive delay: 2000
>>>   Network reconnect delay: 2000
>>> Checking O2CB heartbeat: Active
>>>
>>> And mounting the fs on each node works fine:
>>>
>>> /dev/sdd1 on /shared type ocfs2 (rw,_netdev,heartbeat=local)
>>>
>>> Both nodes can ping each other.
>>>
>>>
>>> xen02a:~# mounted.ocfs2 -d
>>> Device                FS     UUID
>>> Label /dev/sdd1             ocfs2
>>> 55a9d0b0-050c-484f-9725-7788a3b9dde0  ocfs2
>>>
>>> xen02b:~# mounted.ocfs2 -d
>>> Device                FS     UUID
>>> Label /dev/sdd1             ocfs2
>>> 55a9d0b0-050c-484f-9725-7788a3b9dde0  ocfs2
>>>
>>>
>>> Now the problem:
>>>
>>> I first mount the device on node1:
>>>
>>>  xen02a:~# mount -L ocfs2 /shared/
>>> => /dev/sdd1 on /shared type ocfs2 (rw,_netdev,heartbeat=local)
>>> without any errors.
>>>
>>> dmesg says:
>>>
>>> [   97.244054] ocfs2_dlm: Nodes in domain
>>> ("55A9D0B0050C484F97257788A3B9DDE0"): 0
>>> [   97.245869] kjournald starting.  Commit interval 5 seconds
>>> [   97.247045] ocfs2: Mounting device (8,49) on (node 0, slot 0) with
>>> ordered data mode.
>>>
>>> xen02a:~# mounted.ocfs2 -f
>>> Device                FS     Nodes
>>> /dev/sdd1             ocfs2  xen02a
>>>
>>> xen02a:~# echo "slotmap" | debugfs.ocfs2 -n /dev/sdd1
>>> 	Slot#   Node#
>>> 	    0       0
>>>
>>>
>>> Now I mount the device on the second node:
>>>
>>> xen02b:~# mount -L ocfs2 /shared/
>>> => /dev/sdd1 on /shared type ocfs2 (rw,_netdev,heartbeat=local)
>>>
>>> [  269.741168] OCFS2 1.5.0
>>> [  269.765171] ocfs2_dlm: Nodes in domain
>>> ("55A9D0B0050C484F97257788A3B9DDE0"): 1
>>> [  269.779620] kjournald starting.  Commit interval 5 seconds
>>> [  269.779620] ocfs2: Mounting device (8,49) on (node 1, slot 1) with
>>> ordered data mode.
>>> [  269.779620] (2953,0):ocfs2_replay_journal:1149 Recovering node 0
>>> from slot 0 on device (8,49)
>>> [  270.950540] kjournald starting.  Commit interval 5 seconds
>>>
>>> xen02b:~# echo "slotmap" | debugfs.ocfs2 -n /dev/sdd1
>>> 	Slot#   Node#
>>> 	    1       1
>>>
>>> xen02b:~# mounted.ocfs2 -f
>>> Device                FS     Nodes
>>> /dev/sdd1             ocfs2  xen02b
>>>
>>>
>>> So the first mount seems to be gone and any changes on the fs on that
>>> node are not distributed.
>>>
>>> At the moment I have no idea what this could be. I hope someone can
>>> help me.
>>>
>>> regards,
>>> Manuel
>>>
>>> _______________________________________________
>>> Ocfs2-users mailing list
>>> Ocfs2-users at oss.oracle.com
>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
>>
> 
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>