[Ocfs2-users] OCFS2
Andrew.MORLEY at sungard.com
Andrew.MORLEY at sungard.com
Thu Apr 24 09:06:53 PDT 2014
Hi,
I have an issue with ocfs2 and I am not quite sure, where the problem is. I would be grateful for any feedback. The issue looks like a multipath issue, however I have redundant links, so not quite sure why ocfs2 would barf and bring the server down.
I have a set of production servers that have started showing the same error.
I am not aware of any changes within the infrastructure.
setup is.
4 off Equallogic ps6100X.
lots of Dell R610 servers, all with multiple ISCSI interfaces.
This has happened on 3 different servers in the last week, causing the servers to hang.
I have checked all switches and logs and can see no flapping interfaces. I can see the ISCSI initiator make logout and login requests during this time period.
I See in the logs
Apr 22 15:53:09 servername multipathd: eql-0-8a0906-2d6a4c605-13244eee0b250b79_a: Entering recovery mode: max_retries=5
Apr 22 15:53:09 servername multipathd: 8:176: mark as failed
Apr 22 15:53:09 servername multipathd: 8:16: mark as failed
Apr 22 15:53:09 servername multipathd: 8:48: mark as failed
Apr 22 15:53:09 servername multipathd: 8:64: mark as failed
Apr 22 15:53:09 servername multipathd: 8:128: mark as failed
Apr 22 15:53:09 servername multipathd: 8:160: mark as failed
Apr 22 15:53:09 servername multipathd: eql-0-8a0906-2d6a4c605-13244eee0b250b79_a: Entering recovery mode: max_retries=5
Apr 22 15:53:09 servername multipathd: 8:176: mark as failed
Apr 22 15:53:09 servername multipathd: 8:16: mark as failed
Apr 22 15:53:09 servername multipathd: 8:48: mark as failed
Apr 22 15:53:09 servername multipathd: 8:64: mark as failed
Apr 22 15:53:09 servername multipathd: 8:128: mark as failed
Apr 22 15:53:09 servername multipathd: 8:160: mark as failed
Apr 22 15:53:11 servername kernel: (kmpathd/6,2888,6):o2hb_bio_end_io:241 ERROR: IO Error -5
Apr 22 15:53:11 servername kernel: Buffer I/O error on device dm-7, logical block 480
Apr 22 15:53:11 servername kernel: lost page write due to I/O error on dm-7
Apr 22 15:53:11 servername kernel: scsi 114:0:0:0: rejecting I/O to dead device
Apr 22 15:53:11 servername kernel: device-mapper: multipath: Failing path 8:176.
Apr 22 15:53:11 servername kernel: (o2hb-1B3B9BEE63,4754,7):o2hb_do_disk_heartbeat:772 ERROR: status = -5
Apr 22 15:53:11 servername multipathd: dm-4: add map (uevent)
Apr 22 15:53:11 servername kernel: scsi 115:0:0:0: rejecting I/O to dead device
Apr 22 15:53:11 servername kernel: device-mapper: multipath: Failing path 8:16.
Apr 22 15:53:11 servername multipathd: dm-4: devmap already registered
Apr 22 15:53:11 servername multipathd: dm-4: add map (uevent)
Apr 22 15:53:11 servername multipathd: dm-4: devmap already registered
Apr 22 15:53:11 servername multipathd: dm-3: add map (uevent)
Apr 22 15:53:11 servername kernel: scsi 110:0:0:0: rejecting I/O to dead device
Apr 22 15:53:11 servername kernel: device-mapper: multipath: Failing path 8:48.
Apr 22 15:53:17 servername multipathd: asvolume: load table [0 629145600 multipath 0 0 1 1 round-robin 0 6 1 8:32 10
8:80 10 8:96 10 8:112 10 8:144 10 8:16 10]
Apr 22 15:53:17 servername multipathd: dm-2: add map (uevent)
Apr 22 15:53:17 servername multipathd: dm-2: devmap already registered
Apr 22 15:53:17 servername multipathd: dm-8: add map (uevent)
Apr 22 15:53:17 servername iscsid: Connection117:0 to [target: iqn.2001-05.com.equallogic:0-8a0906-2d6a4c605-13244eee
0b250b79-as14volumeocfs2, portal: 192.168.5.100,3260] through [iface: eql.eth2_2] is operational now
Apr 22 15:53:22 servername multipathd: dm-3: add map (uevent)
Apr 22 15:53:22 servername multipathd: dm-3: devmap already registered
Apr 22 15:53:22 servername multipathd: dm-4: add map (uevent)
Apr 22 15:53:22 servername multipathd: dm-4: devmap already registered
Apr 22 15:53:22 servername multipathd: dm-5: add map (uevent)
Apr 22 15:53:22 servername multipathd: dm-5: devmap already registered
Apr 22 15:53:22 servername multipathd: dm-9: add map (uevent)
Apr 22 15:53:22 servername multipathd: dm-9: devmap already registered
Apr 22 15:53:22 servername kernel: get_page_tbl ctx=0xffff810623d041c0 (253:6): bits=2, mask=0x3, num=20480, max=2048
0
Then the ocfs2 has an issue.
Apr 22 15:53:23 servername kernel: (ocfs2cmt,4773,6):ocfs2_commit_cache:191 ERROR: status = -5
Apr 22 15:53:23 servername kernel: (ocfs2cmt,4773,6):ocfs2_commit_thread:1799 ERROR: status = -5
Apr 22 15:53:23 servername kernel: (ocfs2cmt,4773,6):ocfs2_commit_cache:191 ERROR: status = -5
then
Apr 22 15:53:23 servername kernel: s2cmt,4773,6):ocfs2<3>(ocfs2c<3>(ocfs2cmt,4773,6):ocfs2_commit_cache:191 ERROR: status = -
5
Apr 22 15:53:23 servername kernel: (ocfs2cmt,4773,6):ocfs2_commi<3>(ocfs2cm<<3>(<3>(ocfs2cmt,4773,6):ocfs2_commit_cache:191 E
RROR: status = -5
Apr 22 15:53:23 servername kernel: (ocfs2<3>(ocfs2cmt,47<3>(ocf<3>(ocfs2cmt,47<3>(ocfs<3>(ocfs2cmt,4<3>(ocf<3>(ocfs<3>(ocf<3>
(ocfs2cm<3>(o<3>(ocfs2cm<3>(ocf<3>(ocfs2cmt<3>(o<3>(ocfs2cmt<3>(ocfs2cm<3>(ocfs2c<3><3>(ocfs2<3>(oc<3>(ocfs2cmt,<3>(ocf<3>(oc
fs2cmt,47<3>(ocf<3>(ocfs2cmt,47<3>(ocfs<3>(ocfs2c<3>(o<3>(ocfs2c<3>(oc<3>(ocfs2cmt,47<3>(o<3>(ocfs2cmt,477<3>(ocfs<3>(ocfs2c<
3>(ocf<3>(ocfs2cmt<3>(<3>(ocfs2cmt,4773<3>(oc<3>(ocfs2cmt,<3>(oc<3>(ocfs2cmt<3>(ocfs<3>(ocfs2cm<3>(oc<3>(ocfs<3>(oc<3>(ocf<3>
(ocfs2cmt,<3>(oc<3>(ocfs2cmt<3>(ocfs2<3>(ocfs2<3>(<3>(ocfs2cmt,4773,<3>(oc<3>(ocfs2cmt,4773,<3>(ocfs<3>(ocfs2cmt<3>(oc<3>(ocf
s2cmt,477<3>(ocf<3>(ocfs2cmt,477<3>(<3>(ocfs2cmt,<3>(oc<3>(ocfs2cmt,<3>(o<3>(ocfs2cmt<3>(ocfs<3>(ocfs2c<3>(ocf<3>(ocfs2cmt<3>
(ocfs<3>(ocfs2c<3>(ocf<3>(ocfs2cmt<3>(<3>(ocfs2<3>(ocf<3>(ocfs2cmt<3>(oc<3>(ocfs2cmt<3>(oc<3>(ocfs<3>(ocfs2<3>(ocfs2c<3>(o<3>
(ocfs2cmt,4<3>(ocf<3>(ocfs2<3>(oc<3>(ocfs2cm<3>(oc<3>(ocfs2cmt<3>(oc<3>(ocfs2cmt<3>(ocfs<3>(ocfs2cmt,<3>(ocfs<3>(ocfs2c<3>(oc
fs2<3>(ocfs2c<3>(ocfs2c<3>(ocf
Apr 22 15:53:23 servername kernel: 2cmt,4773,6):<3>(ocf<3>(ocfs2cmt,<3>(ocfs2<3>(ocfs2cmt,<3>(ocfs<3>(ocfs2cmt<3>(ocf<3>(ocfs
2cmt,47<3>(ocf<3>(ocfs2cmt,47<3>(ocfs<3>(ocfs2cmt,<3>(o<3>(ocfs2cmt,4<3>(ocf<3>(ocfs2cmt<3>(ocf<3>(ocfs2cmt<3>(ocf<3>(ocfs2cm
t,<3>(ocf<3>(ocfs2cmt<3>(ocfs2<3>(ocfs2cmt<3>(<3>(ocfs2cm<3>(ocfs<3>(ocfs2cmt<3>(ocfs2<3>(ocfs2cmt<3>(oc<3>(ocfs2cmt<3>(ocfs<
3>(ocfs2<3>(ocf<3>(ocfs2cmt,4773,<3>(oc<3>(ocfs2cm<3>(ocfs2<3>(ocfs2cm<3>(oc<3>(ocfs2cmt,4773,6):<3>(<3>(ocfs2cmt<3>(oc<3>(oc
fs2cm<3>(ocfs2<3>(ocfs2cmt<3>(o<3>(ocfs2cmt<3>(ocf<3>(ocfs2c<3>(ocfs2c<3>(ocfs2cmt,<3>(oc<3>(ocfs2c<3>(ocfs2cm<3>(ocfs2cmt<3>
(o<3>(ocfs2cmt<3>(o<3>(ocfs2cm<3><3>(ocfs2cmt<3>(ocfs2c<3>(ocfs2cmt,<3>(o<3>(ocfs2cmt<3>(ocf<3>(ocfs2cmt<3>(ocf<3>(ocfs2cmt<3
>(o<3>(ocfs2<3>(oc<3>(ocfs2cmt,47<3>(oc<3>(ocfs2cmt,4773,6<3>(o<3>(ocfs2cm<3>(ocf<3>(ocfs2<3>(o<3>(ocfs2<3>(<3>(ocfs2cm<3>(oc
<3>(ocfs<3>(ocfs2c<3>(ocfs2cmt<3>(o<3>(ocfs2cm<3>(ocf<3>(ocfs2cmt<3><3>(ocfs2cmt,<3>(o<3>(ocfs2cmt,4<3>(oc<3>(ocfs2c<3>(o<3>(
ocfs2cmt,<3>(o<3>(ocfs2cmt<3>(
Repeated thousands of times and bringing the server to a halt.
cat /etc/multipath.conf
blacklist {
devnode "^sd[a]$"
}
## Use user friendly names, instead of using WWIDs as names.
defaults {
user_friendly_names yes
}
multipaths {
multipath {
wwid 36090a058604c6a2d790b250bee4exxxx
alias asvolume
path_grouping_policy multibus
#path_checker readsector0
path_selector "round-robin 0"
failback immediate
rr_weight priorities
rr_min_io 10
no_path_retry 5
}
}
devices {
device {
vendor "EQLOGIC"
product "100E-00"
path_grouping_policy multibus
getuid_callout "/sbin/scsi_id -g -u -s /block/%n"
#features "1 queue_if_no_path"
path_checker readsector0
path_selector "round-robin 0"
failback immediate
rr_min_io 10
rr_weight priorities
}
}
cat /etc/ocfs2/cluster.conf
node:
ip_port = 8888
ip_address = x.x.x.x
number = 9
name = servername
cluster = ocfs
node:
ip_port = 8888
ip_address = x.x.x.x
number = 109
name = servername1
cluster = ocfs
more nodes in here
cluster:
node_count = 22
name = ocfs
Cluster consists of 14 nodes.
/etc/init.d/o2cb status
Driver for "configfs": Loaded
Filesystem "configfs": Mounted
Driver for "ocfs2_dlmfs": Loaded
Filesystem "ocfs2_dlmfs": Mounted
Checking O2CB cluster ocfs: Online
Heartbeat dead threshold = 61
Network idle timeout: 30000
Network keepalive delay: 2000
Network reconnect delay: 2000
Checking O2CB heartbeat: Active
Server and package information.
cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.10 (Tikanga)
rpm -qa | grep multipath
device-mapper-multipath-0.4.7-59.el5
rpm -qa | grep ocfs2
ocfs2-2.6.18-371.3.1.el5-1.4.10-1.el5
ocfs2-tools-1.4.4-1.el5
ocfs2console-1.4.4-1.el5
rpm -qa | grep kernel
kernel-2.6.18-371.3.1.el5
modinfo ocfs2
filename: /lib/modules/2.6.18-371.3.1.el5/kernel/fs/ocfs2/ocfs2.ko
license: GPL
author: Oracle
version: 1.4.10
description: OCFS2 1.4.10 Thu Dec 5 16:38:36 PST 2013 (build b703e5e0906b370c876b657dabe8d4c8)
srcversion: 41115DB9EFDAA5735C18810
depends: ocfs2_dlm,jbd,ocfs2_nodemanager
vermagic: 2.6.18-371.3.1.el5 SMP mod_unload gcc-4.1
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/ocfs2-users/attachments/20140424/448d9f6f/attachment-0001.html
More information about the Ocfs2-users
mailing list