[rds-devel] QP error event with RDS
Cristian Dittamo
c.dittamo at list-group.com
Wed May 11 05:59:11 PDT 2011
Thank you Venkat.
I tried to find the problem at fabric level, thus I executed the perfquery and ibqueryerrors tools.
Follows their outputs. It seems there are cable connection problems, i.e. LinkDowned = 1.
I will try to change switch’s sockets.
[root at host1 RDSINFO]# perfquery
# Port counters: Lid 5 port 1
PortSelect:......................1
CounterSelect:...................0x1400
SymbolErrors:....................0
LinkRecovers:....................0
LinkDowned:......................1
RcvErrors:.......................125
RcvRemotePhysErrors:.............0
RcvSwRelayErrors:................0
XmtDiscards:.....................10
XmtConstraintErrors:.............0
RcvConstraintErrors:.............0
CounterSelect2:..................0x00
LinkIntegrityErrors:.............0
ExcBufOverrunErrors:.............0
VL15Dropped:.....................0
XmtData:.........................4294967295
RcvData:.........................4294967295
XmtPkts:.........................51939068
RcvPkts:.........................75491204
XmtWait:.........................2601824
[root at host1 conf]# ibqueryerrors -r
Suppressing:
Errors for 0x1e8c0000dc93fb "host3 HCA-1"
GUID 0x1e8c0000dc93fb port 1: [XmtDiscards == 1] [XmtWait == 4969894]
Link info: 4 1[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> 0x000b8cffff004944 2 12[ ] "MT47396 Infiniscale-III Mellanox Technologies" ( )
Errors for 0x1fc6000004d976 "host4 HCA-1"
GUID 0x1fc6000004d976 port 1: [XmtDiscards == 1] [XmtWait == 5821946]
Link info: 3 1[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> 0x000b8cffff004944 2 11[ ] "MT47396 Infiniscale-III Mellanox Technologies" ( )
Errors for 0x1fc600000587b6 "host2 HCA-1"
GUID 0x1fc600000587b6 port 1: [LinkDowned == 1] [RcvErrors == 125] [XmtDiscards == 10] [XmtWait == 2601824]
Link info: 5 1[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> 0x000b8cffff004944 2 2[ ] "MT47396 Infiniscale-III Mellanox Technologies" ( )
Errors for 0xb8cffff004944 "MT47396 Infiniscale-III Mellanox Technologies"
GUID 0xb8cffff004944 port ALL: [LinkRecovers == 13] [LinkDowned == 6] [RcvSwRelayErrors == 492] [XmtDiscards == 1]
GUID 0xb8cffff004944 port 1: [RcvSwRelayErrors == 139]
Link info: 2 1[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> 0x001e8c0000dc942f 1 1[ ] "host1 HCA-1" ( )
GUID 0xb8cffff004944 port 2: [LinkRecovers == 13] [LinkDowned == 6] [RcvSwRelayErrors == 155]
Link info: 2 2[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> 0x001fc600000587b6 5 1[ ] "host2 HCA-1" ( )
GUID 0xb8cffff004944 port 11: [RcvSwRelayErrors == 97] [XmtDiscards == 1]
Link info: 2 11[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> 0x001fc6000004d976 3 1[ ] "host4 HCA-1" ( )
GUID 0xb8cffff004944 port 12: [RcvSwRelayErrors == 101]
Link info: 2 12[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> 0x001e8c0000dc93fb 4 1[ ] "host3 HCA-1" ( )
Errors for 0x1e8c0000dc942f "host1 HCA-1"
GUID 0x1e8c0000dc942f port 1: [XmtDiscards == 1] [XmtWait == 6270721]
Link info: 1 1[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> 0x000b8cffff004944 2 1[ ] "MT47396 Infiniscale-III Mellanox Technologies" ( )
From: Venkat Venkatsubra [mailto:venkat.x.venkatsubra at oracle.com]
Sent: Wednesday, May 11, 2011 2:50 PM
To: c.dittamo at list-group.com
Cc: rds-devel at oss.oracle.com
Subject: Re: [rds-devel] QP error event with RDS
Hello,
Status 12 is IB_WC_RETRY_EXC_ERR (include/rdma/ib_verbs.h).
The IB transport layer gave up after retrying to transmit a number of times.
Can you mail me the rds-info output on the sending as well as the receiving side ?
(a snapshot before and after the run)
Did the receiving side have buffers posted to receive ?
Venkat
----- Original Message -----
From: c.dittamo at list-group.com
To: rds-devel at oss.oracle.com
Sent: Wednesday, May 11, 2011 5:38:42 AM GMT -06:00 US/Canada Central
Subject: [rds-devel] QP error event with RDS
Hi,
I am hitting the following QP error
RDS/IB: send completion on address had status 12, disconnecting and reconnecting
during my application execution on a Linux RHEL5.5 (kernel 2.6.32.32).
My application is a 4 nodes distributed client-server program that leverages the RDS features (i.e. sendmsg and recvmsg) only, i.e. without RDMA. I checked all cables connections and they are fine. All IB (Mellanox) drivers were loaded.
Any ideas why RDS returns this error?
Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://oss.oracle.com/pipermail/rds-devel/attachments/20110511/c7651282/attachment-0001.html
More information about the rds-devel
mailing list