[Ocfs2-users] ocfs2 fencing with multipath and dual channel HBA
florian.engelmann at bt.com
florian.engelmann at bt.com
Sun Jun 7 23:57:44 PDT 2009
> Florian,
> the problem here seems to be with network. The nodes are running into
> network heartbeat timeout and hence second node is getting fenced. Do
> you see o2net thread consuming 100% cpu on any node? if not then
> probably check your network
> thanks,
> --Srini
I forgot to post my /etc/ocfs2/cluster.conf
node:
ip_port = 7777
ip_address = 192.168.0.101
number = 0
name = defr1elcbtd01
cluster = ocfs2
node:
ip_port = 7777
ip_address = 192.168.0.102
number = 1
name = defr1elcbtd02
cluster = ocfs2
cluster:
node_count = 2
name = ocfs2
192.168.0.10x is eth3 on both nodes and connected with a cross over
cable. No active network component is involved here.
defr1elcbtd02:~# traceroute 192.168.0.101
traceroute to 192.168.0.101 (192.168.0.101), 30 hops max, 52 byte
packets
1 node1 (192.168.0.101) 0.220 ms 0.142 ms 0.223 ms
defr1elcbtd02:~#
The error message looks like a network problem but why should there be a
network problem if I shutdown a FC port?! I testet it about 20 times and
got about 16 kernel panics starting with the same error message:
kernel: o2net: no longer connected to node defr1elcbtd01 (num 0) at
192.168.0.101:7777
The cluster is running fine if there is no problem with the SAN
connection.
How to enable verbose logging with ofcs2?
Regards,
Florian
>
> florian.engelmann at bt.com wrote:
> > Hello,
> > our Debian etch cluster nodes are panicing because of ocfs2 fencing
if
> > one SAN path fails.
> >
> > modinfo ocfs2
> > filename: /lib/modules/2.6.18-6-amd64/kernel/fs/ocfs2/ocfs2.ko
> > author: Oracle
> > license: GPL
> > description: OCFS2 1.3.3
> > version: 1.3.3
> > vermagic: 2.6.18-6-amd64 SMP mod_unload gcc-4.1
> > depends: ocfs2_dlm,ocfs2_nodemanager,jbd
> > srcversion: 0798424846E68F10172C203
> >
> > modinfo ocfs2_dlmfs
> > filename:
> > /lib/modules/2.6.18-6-amd64/kernel/fs/ocfs2/dlm/ocfs2_dlmfs.ko
> > author: Oracle
> > license: GPL
> > description: OCFS2 DLMFS 1.3.3
> > version: 1.3.3
> > vermagic: 2.6.18-6-amd64 SMP mod_unload gcc-4.1
> > depends: ocfs2_dlm,ocfs2_nodemanager
> > srcversion: E3780E12396118282B3C1AD
> >
> > defr1elcbtd02:~# modinfo ocfs2_dlm
> > filename:
> > /lib/modules/2.6.18-6-amd64/kernel/fs/ocfs2/dlm/ocfs2_dlm.ko
> > author: Oracle
> > license: GPL
> > description: OCFS2 DLM 1.3.3
> > version: 1.3.3
> > vermagic: 2.6.18-6-amd64 SMP mod_unload gcc-4.1
> > depends: ocfs2_nodemanager
> > srcversion: 7DC395EA08AE4CE826C5B92
> >
> > modinfo ocfs2_nodemanager
> > filename:
> >
/lib/modules/2.6.18-6-amd64/kernel/fs/ocfs2/cluster/ocfs2_nodemanager.ko
> > author: Oracle
> > license: GPL
> > description: OCFS2 Node Manager 1.3.3
> > version: 1.3.3
> > vermagic: 2.6.18-6-amd64 SMP mod_unload gcc-4.1
> > depends: configfs
> > srcversion: C4C9871302E1910B78DAE40
> >
> > modinfo qla2xxx
> > filename:
> > /lib/modules/2.6.18-6-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko
> > author: QLogic Corporation
> > description: QLogic Fibre Channel HBA Driver
> > license: GPL
> > version: 8.01.07-k1
> > vermagic: 2.6.18-6-amd64 SMP mod_unload gcc-4.1
> > depends: scsi_mod,scsi_transport_fc,firmware_class
> > alias: pci:v00001077d00002100sv*sd*bc*sc*i*
> > alias: pci:v00001077d00002200sv*sd*bc*sc*i*
> > alias: pci:v00001077d00002300sv*sd*bc*sc*i*
> > alias: pci:v00001077d00002312sv*sd*bc*sc*i*
> > alias: pci:v00001077d00002322sv*sd*bc*sc*i*
> > alias: pci:v00001077d00006312sv*sd*bc*sc*i*
> > alias: pci:v00001077d00006322sv*sd*bc*sc*i*
> > alias: pci:v00001077d00002422sv*sd*bc*sc*i*
> > alias: pci:v00001077d00002432sv*sd*bc*sc*i*
> > alias: pci:v00001077d00005422sv*sd*bc*sc*i*
> > alias: pci:v00001077d00005432sv*sd*bc*sc*i*
> > srcversion: B8E1608E257391DCAFD9224
> > parm: ql2xfdmienable:Enables FDMI registratons Default is
0 -
> > no FDMI. 1 - perfom FDMI. (int)
> > parm: extended_error_logging:Option to enable extended
error
> > logging, Default is 0 - no logging. 1 - log errors. (int)
> > parm: ql2xallocfwdump:Option to enable allocation of
memory
> > for a firmware dump during HBA initialization. Memory allocation
> > requirements vary by ISP type. Default is 1 - allocate memory.
(int)
> > parm: ql2xloginretrycount:Specify an alternate value for
the
> > NVRAM login retry count. (int)
> > parm: ql2xplogiabsentdevice:Option to enable PLOGI to
devices
> > that are not present after a Fabric scan. This is needed for
several
> > broken switches. Default is 0 - no PLOGI. 1 - perfom PLOGI. (int)
> > parm: qlport_down_retry:Maximum number of command retries
to a
> > port that returns a PORT-DOWN status. (int)
> > parm: ql2xlogintimeout:Login timeout value in seconds.
(int)
> >
> > modinfo dm_multipath
> > filename:
> > /lib/modules/2.6.18-6-amd64/kernel/drivers/md/dm-multipath.ko
> > description: device-mapper multipath target
> > author: Sistina Software <dm-devel at redhat.com>
> > license: GPL
> > vermagic: 2.6.18-6-amd64 SMP mod_unload gcc-4.1
> > depends: dm-mod
> >
> > modinfo dm_mod
> > filename:
/lib/modules/2.6.18-6-amd64/kernel/drivers/md/dm-mod.ko
> > description: device-mapper driver
> > author: Joe Thornber <dm-devel at redhat.com>
> > license: GPL
> > vermagic: 2.6.18-6-amd64 SMP mod_unload gcc-4.1
> > depends:
> > parm: major:The major number of the device mapper (uint)
> >
> > modinfo dm_round_robin
> > filename:
> > /lib/modules/2.6.18-6-amd64/kernel/drivers/md/dm-round-robin.ko
> > description: device-mapper round-robin multipath path selector
> > author: Sistina Software <dm-devel at redhat.com>
> > license: GPL
> > vermagic: 2.6.18-6-amd64 SMP mod_unload gcc-4.1
> > depends: dm-multipath
> >
> > There is no self compiled software just the official repository was
> > used.
> > The nodes are connected to our two independent SANs. The storage
systems
> > are EMC Clariion CX3-20f and EMC Clariion CX500.
> >
> > multipath.conf:
> > defaults {
> > rr_min_io 1000
> > polling_interval 2
> > no_path_retry 5
> > user_friendly_names yes
> > }
> >
> > blacklist {
> > devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
> > devnode "^hd[a-z][[0-9]*]"
> > devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
> > device {
> > vendor "DGC"
> > product "LUNZ" # EMC Management LUN
> > }
> > device {
> > vendor "ATA" #We do not need mutlipathing for local
> > drives
> > product "*"
> > }
> > device {
> > vendor "AMI" # No multipathing for SUN Virtual
devices
> > product "*"
> > }
> > device {
> > vendor "HITACHI" # No multipathing for local scsi
disks
> > product "H101414SCSUN146G"
> > }
> > }
> >
> > devices {
> > ## Device attributes for EMC CLARiiON
> > device {
> > vendor "DGC"
> > product "*"
> > path_grouping_policy group_by_prio
> > getuid_callout "/sbin/scsi_id -g -u -s
> > /block/%n"
> > prio_callout "/sbin/mpath_prio_emc
/dev/%n"
> > hardware_handler "1 emc"
> > features "1 queue_if_no_path"
> > no_path_retry fail
> > path_checker emc_clariion
> > path_selector "round-robin 0"
> > failback immediate
> > user_friendly_names yes
> > }
> > }
> >
> > multipaths {
> > multipath {
> > wwid
> > 3600601603ac511001c7c92fec775dd11
> > alias stosan01_lun070
> > }
> > }
> >
> > multipath -ll:
> > stosan01_lun070 (3600601603ac511001c7c92fec775dd11) dm-7 DGC,RAID 5
> > [size=133G][features=0][hwhandler=1 emc]
> > \_ round-robin 0 [prio=2][active]
> > \_ 0:0:1:1 sdd 8:48 [active][ready]
> > \_ 1:0:1:1 sdh 8:112 [active][ready]
> > \_ round-robin 0 [prio=0][enabled]
> > \_ 0:0:0:1 sdb 8:16 [active][ready]
> > \_ 1:0:0:1 sdf 8:80 [active][ready]
> >
> >
> > As we use lvm2 we added /dev/sd* to the filter:
> > filter = [ "r|/dev/cdrom|", "r|/dev/sd.*|" ]
> >
> > Here is what happened and what we did to reconstruct the situation
to
> > find a solution:
> >
> > On 02.06.2009 we did something wrong with the zoning on one of our
two
> > SANs and all servers (about 40) lost one path to the SAN. Only two
> > servers crashed. Those two are our Debian etch heartbeat cluster
> > described above.
> > The console showed a kernel panic because of ocfs2 was fencing both
> > nodes.
> >
> > This was the message:
> > O2hb_write_timeout: 165 ERROR: Heartbeat write timeout to device
dm-7
> > after 12000 milliseconds
> >
> > So we decided to change the o2cb settings to:
> > O2CB_HEARTBEAT_THRESHOLD=31
> > O2CB_IDLE_TIMEOUT_MS=30000
> > O2CB_KEEPALIVE_DELAY_MS=2000
> > O2CB_RECONNECT_DELAY_MS=2000
> >
> > We switched all cluster resources to the 1st node to test the new
> > settings on the second node. We removed the 2nd node from the zoning
(we
> > also tested shutting down the port with the same result) and got a
> > different error but still ending up with a kernel panic:
> >
> > Jun 4 16:41:05 defr1elcbtd02 kernel: o2net: no longer connected to
node
> > defr1elcbtd01 (num 0) at 192.168.0.101:7777
> > Jun 4 16:41:27 defr1elcbtd02 kernel: rport-0:0-0: blocked FC
remote
> > port time out: removing target and saving binding
> > Jun 4 16:41:27 defr1elcbtd02 kernel: rport-0:0-1: blocked FC
remote
> > port time out: removing target and saving binding
> > Jun 4 16:41:27 defr1elcbtd02 kernel: sd 0:0:1:1: SCSI error: return
> > code = 0x00010000
> > Jun 4 16:41:27 defr1elcbtd02 kernel: end_request: I/O error, dev
sdd,
> > sector 1672
> > Jun 4 16:41:27 defr1elcbtd02 kernel: device-mapper: multipath:
Failing
> > path 8:48.
> > Jun 4 16:41:27 defr1elcbtd02 kernel: device-mapper: multipath:
Failing
> > path 8:16.
> > Jun 4 16:41:27 defr1elcbtd02 kernel: scsi 0:0:1:1: rejecting I/O to
> > device being removed
> > Jun 4 16:41:27 defr1elcbtd02 kernel: device-mapper: multipath emc:
long
> > trespass command will be send
> > Jun 4 16:41:27 defr1elcbtd02 kernel: device-mapper: multipath emc:
> > honor reservation bit will not be set (default)
> > Jun 4 16:41:27 defr1elcbtd02 kernel: device-mapper: table: 253:7:
> > multipath: error getting device
> > Jun 4 16:41:27 defr1elcbtd02 kernel: device-mapper: ioctl: error
adding
> > target to table
> > Jun 4 16:41:27 defr1elcbtd02 kernel: device-mapper: multipath emc:
long
> > trespass command will be send
> > Jun 4 16:41:27 defr1elcbtd02 kernel: device-mapper: multipath emc:
> > honor reservation bit will not be set (default)
> > Jun 4 16:41:29 defr1elcbtd02 kernel: device-mapper: multipath emc:
> > emc_pg_init: sending switch-over command
> > Jun 4 16:42:01 defr1elcbtd02 kernel:
> > (10751,1):dlm_send_remote_convert_request:395 ERROR: status = -107
> > Jun 4 16:42:01 defr1elcbtd02 kernel:
> > (10751,1):dlm_wait_for_node_death:374
5EE89BC01EFC405E9197C198DEEAE678:
> > waiting 5000ms for notification of death of node 0
> > Jun 4 16:42:07 defr1elcbtd02 kernel:
> > (10751,1):dlm_send_remote_convert_request:395 ERROR: status = -107
> > Jun 4 16:42:07 defr1elcbtd02 kernel:
> > (10751,1):dlm_wait_for_node_death:374
5EE89BC01EFC405E9197C198DEEAE678:
> > waiting 5000ms for notification of death of node 0
> > [...]
> > After 60 seconds:
> >
> > (8,0): o2quo_make_decision:143 ERROR: fending this node because it
is
> > connected to a half-quorum of 1 out of 2 nodes which doesn't include
the
> > lowest active node 0
> >
> >
> > multipath -ll changed to:
> > stosan01_lun070 (3600601603ac511001c7c92fec775dd11) dm-7 DGC,RAID 5
> > [size=133G][features=0][hwhandler=1 emc]
> > \_ round-robin 0 [prio=1][active]
> > \_ 0:0:1:1 sdd 8:48 [active][ready]
> > \_ round-robin 0 [prio=0][enabled]
> > \_ 0:0:0:1 sdb 8:16 [active][ready]
> >
> > The ocfs2 filesystem is still mounted an writable. Even if I enable
the
> > zoneing (or the FC port) again within the 60 seconds ocfs2 does not
> > reconnect to node 1 and panics the kernel after 60 seconds while
> > multipath -ll shows both path again.
> >
> > I do not understand at all what the Ethernet heartbeat connection of
> > ocfs2 has to do with the SAN connection.
> >
> > The strangest thing at all is - this does not happen always! After
some
> > reboots the system keeps running stable even if I shutdown a FC port
and
> > enable it again many times. There is no constant behaviour... It
happens
> > most of the times, but at about 10% it does not happen and
everything is
> > working as intended.
> >
> > Any explanations or ideas what causes this behaviour?
> >
> > I will test this on Debian lenny to see if the Debian version makes
a
> > difference.
> >
> > Best regards,
> > Florian
> >
> > _______________________________________________
> > Ocfs2-users mailing list
> > Ocfs2-users at oss.oracle.com
> > http://oss.oracle.com/mailman/listinfo/ocfs2-users
> >
More information about the Ocfs2-users
mailing list