[Ocfs2-users] ocfs2 fencing with multipath and dual channel HBA

Sun Jun 7 23:57:44 PDT 2009

> Florian,
> the problem here seems to be with network. The nodes are running into
> network heartbeat timeout and hence second node is getting fenced. Do
> you see o2net thread consuming 100% cpu on any node? if not then
> probably check your network
> thanks,
> --Srini

I forgot to post my /etc/ocfs2/cluster.conf
node:
        ip_port = 7777
        ip_address = 192.168.0.101
        number = 0
        name = defr1elcbtd01
        cluster = ocfs2

node:
        ip_port = 7777
        ip_address = 192.168.0.102
        number = 1
        name = defr1elcbtd02
        cluster = ocfs2

cluster:
        node_count = 2
        name = ocfs2

192.168.0.10x is eth3 on both nodes and connected with a cross over
cable. No active network component is involved here.

defr1elcbtd02:~# traceroute 192.168.0.101
traceroute to 192.168.0.101 (192.168.0.101), 30 hops max, 52 byte
packets
 1  node1 (192.168.0.101)  0.220 ms  0.142 ms  0.223 ms
defr1elcbtd02:~#

The error message looks like a network problem but why should there be a
network problem if I shutdown a FC port?! I testet it about 20 times and
got about 16 kernel panics starting with the same error message:

kernel: o2net: no longer connected to node defr1elcbtd01 (num 0) at
192.168.0.101:7777 

The cluster is running fine if there is no problem with the SAN
connection.

How to enable verbose logging with ofcs2?

Regards,
Florian

> 
> florian.engelmann at bt.com wrote:
> > Hello,
> > our Debian etch cluster nodes are panicing because of ocfs2 fencing
if
> > one SAN path fails.
> >
> > modinfo ocfs2
> > filename:       /lib/modules/2.6.18-6-amd64/kernel/fs/ocfs2/ocfs2.ko
> > author:         Oracle
> > license:        GPL
> > description:    OCFS2 1.3.3
> > version:        1.3.3
> > vermagic:       2.6.18-6-amd64 SMP mod_unload gcc-4.1
> > depends:        ocfs2_dlm,ocfs2_nodemanager,jbd
> > srcversion:     0798424846E68F10172C203
> >
> > modinfo ocfs2_dlmfs
> > filename:
> > /lib/modules/2.6.18-6-amd64/kernel/fs/ocfs2/dlm/ocfs2_dlmfs.ko
> > author:         Oracle
> > license:        GPL
> > description:    OCFS2 DLMFS 1.3.3
> > version:        1.3.3
> > vermagic:       2.6.18-6-amd64 SMP mod_unload gcc-4.1
> > depends:        ocfs2_dlm,ocfs2_nodemanager
> > srcversion:     E3780E12396118282B3C1AD
> >
> > defr1elcbtd02:~# modinfo ocfs2_dlm
> > filename:
> > /lib/modules/2.6.18-6-amd64/kernel/fs/ocfs2/dlm/ocfs2_dlm.ko
> > author:         Oracle
> > license:        GPL
> > description:    OCFS2 DLM 1.3.3
> > version:        1.3.3
> > vermagic:       2.6.18-6-amd64 SMP mod_unload gcc-4.1
> > depends:        ocfs2_nodemanager
> > srcversion:     7DC395EA08AE4CE826C5B92
> >
> > modinfo ocfs2_nodemanager
> > filename:
> >
/lib/modules/2.6.18-6-amd64/kernel/fs/ocfs2/cluster/ocfs2_nodemanager.ko
> > author:         Oracle
> > license:        GPL
> > description:    OCFS2 Node Manager 1.3.3
> > version:        1.3.3
> > vermagic:       2.6.18-6-amd64 SMP mod_unload gcc-4.1
> > depends:        configfs
> > srcversion:     C4C9871302E1910B78DAE40
> >
> > modinfo qla2xxx
> > filename:
> > /lib/modules/2.6.18-6-amd64/kernel/drivers/scsi/qla2xxx/qla2xxx.ko
> > author:         QLogic Corporation
> > description:    QLogic Fibre Channel HBA Driver
> > license:        GPL
> > version:        8.01.07-k1
> > vermagic:       2.6.18-6-amd64 SMP mod_unload gcc-4.1
> > depends:        scsi_mod,scsi_transport_fc,firmware_class
> > alias:          pci:v00001077d00002100sv*sd*bc*sc*i*
> > alias:          pci:v00001077d00002200sv*sd*bc*sc*i*
> > alias:          pci:v00001077d00002300sv*sd*bc*sc*i*
> > alias:          pci:v00001077d00002312sv*sd*bc*sc*i*
> > alias:          pci:v00001077d00002322sv*sd*bc*sc*i*
> > alias:          pci:v00001077d00006312sv*sd*bc*sc*i*
> > alias:          pci:v00001077d00006322sv*sd*bc*sc*i*
> > alias:          pci:v00001077d00002422sv*sd*bc*sc*i*
> > alias:          pci:v00001077d00002432sv*sd*bc*sc*i*
> > alias:          pci:v00001077d00005422sv*sd*bc*sc*i*
> > alias:          pci:v00001077d00005432sv*sd*bc*sc*i*
> > srcversion:     B8E1608E257391DCAFD9224
> > parm:           ql2xfdmienable:Enables FDMI registratons Default is
0 -
> > no FDMI. 1 - perfom FDMI. (int)
> > parm:           extended_error_logging:Option to enable extended
error
> > logging, Default is 0 - no logging. 1 - log errors. (int)
> > parm:           ql2xallocfwdump:Option to enable allocation of
memory
> > for a firmware dump during HBA initialization.  Memory allocation
> > requirements vary by ISP type.  Default is 1 - allocate memory.
(int)
> > parm:           ql2xloginretrycount:Specify an alternate value for
the
> > NVRAM login retry count. (int)
> > parm:           ql2xplogiabsentdevice:Option to enable PLOGI to
devices
> > that are not present after a Fabric scan.  This is needed for
several
> > broken switches. Default is 0 - no PLOGI. 1 - perfom PLOGI. (int)
> > parm:           qlport_down_retry:Maximum number of command retries
to a
> > port that returns a PORT-DOWN status. (int)
> > parm:           ql2xlogintimeout:Login timeout value in seconds.
(int)
> >
> > modinfo dm_multipath
> > filename:
> > /lib/modules/2.6.18-6-amd64/kernel/drivers/md/dm-multipath.ko
> > description:    device-mapper multipath target
> > author:         Sistina Software <dm-devel at redhat.com>
> > license:        GPL
> > vermagic:       2.6.18-6-amd64 SMP mod_unload gcc-4.1
> > depends:        dm-mod
> >
> > modinfo dm_mod
> > filename:
/lib/modules/2.6.18-6-amd64/kernel/drivers/md/dm-mod.ko
> > description:    device-mapper driver
> > author:         Joe Thornber <dm-devel at redhat.com>
> > license:        GPL
> > vermagic:       2.6.18-6-amd64 SMP mod_unload gcc-4.1
> > depends:
> > parm:           major:The major number of the device mapper (uint)
> >
> > modinfo dm_round_robin
> > filename:
> > /lib/modules/2.6.18-6-amd64/kernel/drivers/md/dm-round-robin.ko
> > description:    device-mapper round-robin multipath path selector
> > author:         Sistina Software <dm-devel at redhat.com>
> > license:        GPL
> > vermagic:       2.6.18-6-amd64 SMP mod_unload gcc-4.1
> > depends:        dm-multipath
> >
> > There is no self compiled software just the official repository was
> > used.
> > The nodes are connected to our two independent SANs. The storage
systems
> > are EMC Clariion CX3-20f and EMC Clariion CX500.
> >
> > multipath.conf:
> > defaults {
> >         rr_min_io                       1000
> >         polling_interval                2
> >         no_path_retry                   5
> >         user_friendly_names             yes
> > }
> >
> > blacklist {
> >         devnode "^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*"
> >         devnode "^hd[a-z][[0-9]*]"
> >         devnode "^cciss!c[0-9]d[0-9]*[p[0-9]*]"
> >         device {
> >                 vendor "DGC"
> >                 product "LUNZ" # EMC Management LUN
> >         }
> >         device {
> >                 vendor "ATA"  #We do not need mutlipathing for local
> > drives
> >                 product "*"
> >         }
> >         device {
> >                 vendor "AMI" # No multipathing for SUN Virtual
devices
> >                 product "*"
> >         }
> >         device {
> >                 vendor "HITACHI" # No multipathing for local scsi
disks
> >                 product "H101414SCSUN146G"
> >         }
> > }
> >
> > devices {
> >         ## Device attributes for EMC CLARiiON
> >         device {
> >                 vendor                  "DGC"
> >                 product                 "*"
> >                 path_grouping_policy    group_by_prio
> >                 getuid_callout          "/sbin/scsi_id -g -u -s
> > /block/%n"
> >                 prio_callout            "/sbin/mpath_prio_emc
/dev/%n"
> >                 hardware_handler        "1 emc"
> >                 features                "1 queue_if_no_path"
> >                 no_path_retry           fail
> >                 path_checker            emc_clariion
> >                 path_selector           "round-robin 0"
> >                 failback                immediate
> >                 user_friendly_names     yes
> >         }
> > }
> >
> > multipaths {
> >         multipath {
> >                 wwid
> > 3600601603ac511001c7c92fec775dd11
> >                 alias                   stosan01_lun070
> >         }
> > }
> >
> > multipath -ll:
> > stosan01_lun070 (3600601603ac511001c7c92fec775dd11) dm-7 DGC,RAID 5
> > [size=133G][features=0][hwhandler=1 emc]
> > \_ round-robin 0 [prio=2][active]
> >  \_ 0:0:1:1 sdd 8:48  [active][ready]
> >  \_ 1:0:1:1 sdh 8:112 [active][ready]
> > \_ round-robin 0 [prio=0][enabled]
> >  \_ 0:0:0:1 sdb 8:16  [active][ready]
> >  \_ 1:0:0:1 sdf 8:80  [active][ready]
> >
> >
> > As we use lvm2 we added /dev/sd* to the filter:
> > filter = [ "r|/dev/cdrom|", "r|/dev/sd.*|" ]
> >
> > Here is what happened and what we did to reconstruct the situation
to
> > find a solution:
> >
> > On 02.06.2009 we did something wrong with the zoning on one of our
two
> > SANs and all servers (about 40) lost one path to the SAN. Only two
> > servers crashed. Those two are our Debian etch heartbeat cluster
> > described above.
> > The console showed a kernel panic because of ocfs2 was fencing both
> > nodes.
> >
> > This was the message:
> > O2hb_write_timeout: 165 ERROR: Heartbeat write timeout to device
dm-7
> > after 12000 milliseconds
> >
> > So we decided to change the o2cb settings to:
> > O2CB_HEARTBEAT_THRESHOLD=31
> > O2CB_IDLE_TIMEOUT_MS=30000
> > O2CB_KEEPALIVE_DELAY_MS=2000
> > O2CB_RECONNECT_DELAY_MS=2000
> >
> > We switched all cluster resources to the 1st node to test the new
> > settings on the second node. We removed the 2nd node from the zoning
(we
> > also tested shutting down the port with the same result) and got a
> > different error but still ending up with a kernel panic:
> >
> > Jun  4 16:41:05 defr1elcbtd02 kernel: o2net: no longer connected to
node
> > defr1elcbtd01 (num 0) at 192.168.0.101:7777
> > Jun  4 16:41:27 defr1elcbtd02 kernel:  rport-0:0-0: blocked FC
remote
> > port time out: removing target and saving binding
> > Jun  4 16:41:27 defr1elcbtd02 kernel:  rport-0:0-1: blocked FC
remote
> > port time out: removing target and saving binding
> > Jun  4 16:41:27 defr1elcbtd02 kernel: sd 0:0:1:1: SCSI error: return
> > code = 0x00010000
> > Jun  4 16:41:27 defr1elcbtd02 kernel: end_request: I/O error, dev
sdd,
> > sector 1672
> > Jun  4 16:41:27 defr1elcbtd02 kernel: device-mapper: multipath:
Failing
> > path 8:48.
> > Jun  4 16:41:27 defr1elcbtd02 kernel: device-mapper: multipath:
Failing
> > path 8:16.
> > Jun  4 16:41:27 defr1elcbtd02 kernel: scsi 0:0:1:1: rejecting I/O to
> > device being removed
> > Jun  4 16:41:27 defr1elcbtd02 kernel: device-mapper: multipath emc:
long
> > trespass command will be send
> > Jun  4 16:41:27 defr1elcbtd02 kernel: device-mapper: multipath emc:
> > honor reservation bit will not be set (default)
> > Jun  4 16:41:27 defr1elcbtd02 kernel: device-mapper: table: 253:7:
> > multipath: error getting device
> > Jun  4 16:41:27 defr1elcbtd02 kernel: device-mapper: ioctl: error
adding
> > target to table
> > Jun  4 16:41:27 defr1elcbtd02 kernel: device-mapper: multipath emc:
long
> > trespass command will be send
> > Jun  4 16:41:27 defr1elcbtd02 kernel: device-mapper: multipath emc:
> > honor reservation bit will not be set (default)
> > Jun  4 16:41:29 defr1elcbtd02 kernel: device-mapper: multipath emc:
> > emc_pg_init: sending switch-over command
> > Jun  4 16:42:01 defr1elcbtd02 kernel:
> > (10751,1):dlm_send_remote_convert_request:395 ERROR: status = -107
> > Jun  4 16:42:01 defr1elcbtd02 kernel:
> > (10751,1):dlm_wait_for_node_death:374
5EE89BC01EFC405E9197C198DEEAE678:
> > waiting 5000ms for notification of death of node 0
> > Jun  4 16:42:07 defr1elcbtd02 kernel:
> > (10751,1):dlm_send_remote_convert_request:395 ERROR: status = -107
> > Jun  4 16:42:07 defr1elcbtd02 kernel:
> > (10751,1):dlm_wait_for_node_death:374
5EE89BC01EFC405E9197C198DEEAE678:
> > waiting 5000ms for notification of death of node 0
> > [...]
> > After 60 seconds:
> >
> > (8,0): o2quo_make_decision:143 ERROR: fending this node because it
is
> > connected to a half-quorum of 1 out of 2 nodes which doesn't include
the
> > lowest active node 0
> >
> >
> > multipath -ll changed to:
> > stosan01_lun070 (3600601603ac511001c7c92fec775dd11) dm-7 DGC,RAID 5
> > [size=133G][features=0][hwhandler=1 emc]
> > \_ round-robin 0 [prio=1][active]
> >  \_ 0:0:1:1 sdd 8:48  [active][ready]
> > \_ round-robin 0 [prio=0][enabled]
> >  \_ 0:0:0:1 sdb 8:16  [active][ready]
> >
> > The ocfs2 filesystem is still mounted an writable. Even if I enable
the
> > zoneing (or the FC port) again within the 60 seconds ocfs2 does not
> > reconnect to node 1 and panics the kernel after 60 seconds while
> > multipath -ll shows both path again.
> >
> > I do not understand at all what the Ethernet heartbeat connection of
> > ocfs2 has to do with the SAN connection.
> >
> > The strangest thing at all is - this does not happen always! After
some
> > reboots the system keeps running stable even if I shutdown a FC port
and
> > enable it again many times. There is no constant behaviour... It
happens
> > most of the times, but at about 10% it does not happen and
everything is
> > working as intended.
> >
> > Any explanations or ideas what causes this behaviour?
> >
> > I will test this on Debian lenny to see if the Debian version makes
a
> > difference.
> >
> > Best regards,
> > Florian
> >
> > _______________________________________________
> > Ocfs2-users mailing list
> > Ocfs2-users at oss.oracle.com
> > http://oss.oracle.com/mailman/listinfo/ocfs2-users
> >