[Ocfs2-users] 2 node cluster with shared LUN via FC

Manuel Bogner manuel.bogner at geizhals.at
Thu Nov 4 09:31:41 PDT 2010


Hi,

I just upgraded to a bpo kernel 2.6.32-bpo.5-amd64 and now it logs the
following:

Nov  4 17:27:37 localhost kernel: [  487.098196] ocfs2_dlm: Nodes in
domain ("8CEAFACAAE3B4A9BB6AAC6A7664EE094"): 0
Nov  4 17:27:37 localhost kernel: [  487.105327] ocfs2: Mounting device
(8,49) on (node 0, slot 0) with ordered data mode.
Nov  4 17:28:11 localhost kernel: [  521.163897] o2net: accepted
connection from node xen02b (num 1) at 192.168.100.101:7777


Nov  4 17:27:59 localhost kernel: [  577.338311] ocfs2_dlm: Nodes in
domain ("8CEAFACAAE3B4A9BB6AAC6A7664EE094"): 1
Nov  4 17:27:59 localhost kernel: [  577.351868] ocfs2: Mounting device
(8,49) on (node 1, slot 1) with ordered data mode.
Nov  4 17:27:59 localhost kernel: [  577.352241]
(2287,2):ocfs2_replay_journal:1607 Recovering node 0 from slot 0 on
device (8,49)
Nov  4 17:28:00 localhost kernel: [  578.505783]
(2287,0):ocfs2_begin_quota_recovery:376 Beginning quota recovery in slot 0
Nov  4 17:28:00 localhost kernel: [  578.569121]
(2241,0):ocfs2_finish_quota_recovery:569 Finishing quota recovery in slot 0
Nov  4 17:28:11 localhost kernel: [  589.359996] o2net: connected to
node xen02a (num 0) at 192.168.100.100:7777

process description for the log:

node1: mount
node2: mount

still the same but now it logs something about the quota.

(i also changed the network port for the traffic. now they are directly
attached to each other.)

regards,
Manuel


Am 2010-11-04 15:49, schrieb Manuel Bogner:
> Hi,
> 
> this could also be interesting. I tried mount /dev/sdd1 /shared/ on both
> nodes at the same time with the following log result:
> 
> [  331.158166] OCFS2 1.5.0
> [  336.155577] ocfs2_dlm: Nodes in domain
> ("55A9D0B0050C484F97257788A3B9DDE0"): 0
> [  336.166327] kjournald starting.  Commit interval 5 seconds
> [  336.166327] ocfs2: Mounting device (8,49) on (node 0, slot 1) with
> ordered data mode.
> [  336.166664] (3239,0):ocfs2_replay_journal:1149 Recovering node 1 from
> slot 0 on device (8,49)
> [  337.350942] kjournald starting.  Commit interval 5 seconds
> [  351.142229] o2net: accepted connection from node xen02b (num 1) at
> 10.0.0.102:7777
> [  495.059065] o2net: no longer connected to node xen02b (num 1) at
> 10.0.0.102:7777
> 
> 
> [ 4841.036991] ocfs2_dlm: Nodes in domain
> ("55A9D0B0050C484F97257788A3B9DDE0"): 1
> [ 4841.039225] kjournald starting.  Commit interval 5 seconds
> [ 4841.039997] ocfs2: Mounting device (8,49) on (node 1, slot 0) with
> ordered data mode.
> [ 4862.033837] o2net: connected to node xen02a (num 0) at 10.0.0.168:7777
> [ 5005.996422] o2net: no longer connected to node xen02a (num 0) at
> 10.0.0.168:7777
> [ 5005.998393] ocfs2: Unmounting device (8,49) on (node 1)
> 
> 
> at the end xen02a was the only one that had it mounted.
> 
> regards,
> Manuel
> 
> 
> Am 2010-11-04 15:14, schrieb Manuel Bogner:
>> Hi Sérgio,
>>
>> thanks for your quick answere.
>>
>> There are such lines after waiting a little bit, but still the same
>> behavior.
>>
>> [ 2063.720211] o2net: connected to node xen02a (num 0) at 10.0.0.168:7777
>>
>> [ 1979.611076] o2net: accepted connection from node xen02b (num 1) at
>> 10.0.0.102:7777
>>
>>
>> xen02a:~# lsmod | egrep 'jbd|ocfs2|configfs'
>> ocfs2                 395816  1
>> ocfs2_dlmfs            23696  1
>> ocfs2_stack_o2cb        9088  1
>> ocfs2_dlm             197824  2 ocfs2_dlmfs,ocfs2_stack_o2cb
>> ocfs2_nodemanager     208744  8 ocfs2,ocfs2_dlmfs,ocfs2_stack_o2cb,ocfs2_dlm
>> ocfs2_stackglue        16432  2 ocfs2,ocfs2_stack_o2cb
>> configfs               29736  2 ocfs2_nodemanager
>> jbd                    54696  2 ocfs2,ext3
>>
>> xen02a:~# netstat -an | grep 7777
>> tcp        0      0 10.0.0.168:7777         0.0.0.0:*
>> LISTEN
>> tcp        0      0 10.0.0.168:7777         10.0.0.102:47547
>> ESTABLISHED
>>
>> xen02b:~# lsmod | egrep 'jbd|ocfs2|configfs'
>> ocfs2                 395816  1
>> ocfs2_dlmfs            23696  1
>> ocfs2_stack_o2cb        9088  1
>> ocfs2_dlm             197824  2 ocfs2_dlmfs,ocfs2_stack_o2cb
>> ocfs2_nodemanager     208744  8 ocfs2,ocfs2_dlmfs,ocfs2_stack_o2cb,ocfs2_dlm
>> ocfs2_stackglue        16432  2 ocfs2,ocfs2_stack_o2cb
>> configfs               29736  2 ocfs2_nodemanager
>> jbd                    54696  2 ocfs2,ext3
>>
>> xen02b:~# netstat -an | grep 7777
>> tcp        0      0 10.0.0.102:7777         0.0.0.0:*
>> LISTEN
>> tcp        0      0 10.0.0.102:47547        10.0.0.168:7777
>> ESTABLISHED
>>
>> There are no iptables-entries on both nodes as they are just test-servers.
>>
>> xen02a:~# uname -a
>> Linux xen02a 2.6.26-2-xen-amd64 #1 SMP Thu Sep 16 16:32:15 UTC 2010
>> x86_64 GNU/Linux
>>
>> xen02b:~# uname -a
>> Linux xen02b 2.6.26-2-xen-amd64 #1 SMP Thu Sep 16 16:32:15 UTC 2010
>> x86_64 GNU/Linux
>>
>> xen02b:~# cat /etc/default/o2cb
>> #
>> # This is a configuration file for automatic startup of the O2CB
>> # driver.  It is generated by running /etc/init.d/o2cb configure.
>> # On Debian based systems the preferred method is running
>> # 'dpkg-reconfigure ocfs2-tools'.
>> #
>>
>> # O2CB_ENABLED: 'true' means to load the driver on boot.
>> O2CB_ENABLED=true
>>
>> # O2CB_STACK: The name of the cluster stack backing O2CB.
>> O2CB_STACK=o2cb
>>
>> # O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start.
>> O2CB_BOOTCLUSTER=ocfs2
>>
>> # O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead.
>> O2CB_HEARTBEAT_THRESHOLD=31
>>
>> # O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is
>> considered dead.
>> O2CB_IDLE_TIMEOUT_MS=30000
>>
>> # O2CB_KEEPALIVE_DELAY_MS: Max time in ms before a keepalive packet is sent
>> O2CB_KEEPALIVE_DELAY_MS=2000
>>
>> # O2CB_RECONNECT_DELAY_MS: Min time in ms between connection attempts
>> O2CB_RECONNECT_DELAY_MS=2000
>>
>>
>> xen02b:~# mount
>> /dev/sda1 on / type ext3 (rw,errors=remount-ro)
>> tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
>> proc on /proc type proc (rw,noexec,nosuid,nodev)
>> sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
>> procbususb on /proc/bus/usb type usbfs (rw)
>> udev on /dev type tmpfs (rw,mode=0755)
>> tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
>> devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
>> configfs on /sys/kernel/config type configfs (rw)
>> ocfs2_dlmfs on /dlm type ocfs2_dlmfs (rw)
>> /dev/sdd1 on /shared type ocfs2 (rw,_netdev,heartbeat=local)
>>
>>
>> regards,
>> Manuel
>>
>> Am 2010-11-04 15:03, schrieb Sérgio Surkamp:
>>> It seems that the o2net (network stack) is not running as you should
>>> see the network messages in dmesg. Something like:
>>>
>>> xen02a kernel: o2net: connected to node xen02b (num 0) at
>>> 10.0.0.102:7777
>>>
>>> Check your firewall and network configurations, also check if [o2net]
>>> kernel thread is running and the tcp port 7777 is listening in both
>>> nodes. If the thread is not running, check if you have all needed
>>> kernel modules loaded:
>>>
>>> ocfs2
>>> jbd
>>> ocfs2_dlm
>>> ocfs2_dlmfs
>>> ocfs2_nodemanager
>>> configfs
>>>
>>> Regards,
>>> Sérgio
>>>
>>> Em Thu, 04 Nov 2010 14:12:11 +0100
>>> Manuel Bogner <manuel.bogner at geizhals.at> escreveu:
>>>
>>>> sorry for the repost, but just saw that i mixed german and english...
>>>> here is the corrected version:
>>>>
>>>>
>>>>
>>>> Hi,
>>>>
>>>> I'm trying to create a cluster out of 2 nodes. Both systems share the
>>>> same LUN via FC and see it as /dev/sdd.
>>>>
>>>> /dev/sdd has one partition
>>>>
>>>> Disk /dev/sdd: 21.4 GB, 21474836480 bytes
>>>> 64 heads, 32 sectors/track, 20480 cylinders
>>>> Units = cylinders of 2048 * 512 = 1048576 bytes
>>>> Disk identifier: 0xc29cb93d
>>>>
>>>>    Device Boot      Start         End      Blocks   Id  System
>>>> /dev/sdd1               1       20480    20971504   83  Linux
>>>>
>>>> which is formated with
>>>>
>>>>   mkfs.ocfs2 -L ocfs2 /dev/sdd1
>>>>
>>>>
>>>> Here is my /etc/ocfs2/cluster.conf
>>>>
>>>> node:
>>>>     ip_port = 7777
>>>>     ip_address = 10.0.0.168
>>>>     number = 0
>>>>     name = xen02a
>>>>     cluster = ocfs2
>>>>
>>>> node:
>>>>     ip_port = 7777
>>>>     ip_address = 10.0.0.102
>>>>     number = 1
>>>>     name = xen02b
>>>>     cluster = ocfs2
>>>>
>>>> cluster:
>>>>     node_count = 2
>>>>     name = ocfs2
>>>>
>>>>
>>>> Everything seems to be fine:
>>>>
>>>> xen02a:~# /etc/init.d/o2cb status
>>>> Driver for "configfs": Loaded
>>>> Filesystem "configfs": Mounted
>>>> Stack glue driver: Loaded
>>>> Stack plugin "o2cb": Loaded
>>>> Driver for "ocfs2_dlmfs": Loaded
>>>> Filesystem "ocfs2_dlmfs": Mounted
>>>> Checking O2CB cluster ocfs2: Online
>>>> Heartbeat dead threshold = 31
>>>>   Network idle timeout: 30000
>>>>   Network keepalive delay: 2000
>>>>   Network reconnect delay: 2000
>>>> Checking O2CB heartbeat: Active
>>>>
>>>> And mounting the fs on each node works fine:
>>>>
>>>> /dev/sdd1 on /shared type ocfs2 (rw,_netdev,heartbeat=local)
>>>>
>>>> Both nodes can ping each other.
>>>>
>>>>
>>>> xen02a:~# mounted.ocfs2 -d
>>>> Device                FS     UUID
>>>> Label /dev/sdd1             ocfs2
>>>> 55a9d0b0-050c-484f-9725-7788a3b9dde0  ocfs2
>>>>
>>>> xen02b:~# mounted.ocfs2 -d
>>>> Device                FS     UUID
>>>> Label /dev/sdd1             ocfs2
>>>> 55a9d0b0-050c-484f-9725-7788a3b9dde0  ocfs2
>>>>
>>>>
>>>> Now the problem:
>>>>
>>>> I first mount the device on node1:
>>>>
>>>>  xen02a:~# mount -L ocfs2 /shared/
>>>> => /dev/sdd1 on /shared type ocfs2 (rw,_netdev,heartbeat=local)
>>>> without any errors.
>>>>
>>>> dmesg says:
>>>>
>>>> [   97.244054] ocfs2_dlm: Nodes in domain
>>>> ("55A9D0B0050C484F97257788A3B9DDE0"): 0
>>>> [   97.245869] kjournald starting.  Commit interval 5 seconds
>>>> [   97.247045] ocfs2: Mounting device (8,49) on (node 0, slot 0) with
>>>> ordered data mode.
>>>>
>>>> xen02a:~# mounted.ocfs2 -f
>>>> Device                FS     Nodes
>>>> /dev/sdd1             ocfs2  xen02a
>>>>
>>>> xen02a:~# echo "slotmap" | debugfs.ocfs2 -n /dev/sdd1
>>>> 	Slot#   Node#
>>>> 	    0       0
>>>>
>>>>
>>>> Now I mount the device on the second node:
>>>>
>>>> xen02b:~# mount -L ocfs2 /shared/
>>>> => /dev/sdd1 on /shared type ocfs2 (rw,_netdev,heartbeat=local)
>>>>
>>>> [  269.741168] OCFS2 1.5.0
>>>> [  269.765171] ocfs2_dlm: Nodes in domain
>>>> ("55A9D0B0050C484F97257788A3B9DDE0"): 1
>>>> [  269.779620] kjournald starting.  Commit interval 5 seconds
>>>> [  269.779620] ocfs2: Mounting device (8,49) on (node 1, slot 1) with
>>>> ordered data mode.
>>>> [  269.779620] (2953,0):ocfs2_replay_journal:1149 Recovering node 0
>>>> from slot 0 on device (8,49)
>>>> [  270.950540] kjournald starting.  Commit interval 5 seconds
>>>>
>>>> xen02b:~# echo "slotmap" | debugfs.ocfs2 -n /dev/sdd1
>>>> 	Slot#   Node#
>>>> 	    1       1
>>>>
>>>> xen02b:~# mounted.ocfs2 -f
>>>> Device                FS     Nodes
>>>> /dev/sdd1             ocfs2  xen02b
>>>>
>>>>
>>>> So the first mount seems to be gone and any changes on the fs on that
>>>> node are not distributed.
>>>>
>>>> At the moment I have no idea what this could be. I hope someone can
>>>> help me.
>>>>
>>>> regards,
>>>> Manuel
>>>>
>>>> _______________________________________________
>>>> Ocfs2-users mailing list
>>>> Ocfs2-users at oss.oracle.com
>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>>
>>>
>>
>> _______________________________________________
>> Ocfs2-users mailing list
>> Ocfs2-users at oss.oracle.com
>> http://oss.oracle.com/mailman/listinfo/ocfs2-users
>>
> 
> _______________________________________________
> Ocfs2-users mailing list
> Ocfs2-users at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/ocfs2-users
> 



More information about the Ocfs2-users mailing list