<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type content="text/html; charset=utf-8"><meta name=Generator content="Microsoft Word 14 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri","sans-serif";
        mso-fareast-language:EN-US;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
        {mso-style-priority:99;
        mso-style-link:"Plain Text Char";
        margin:0in;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri","sans-serif";
        mso-fareast-language:EN-US;}
p
        {mso-style-priority:99;
        margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
pre
        {mso-style-priority:99;
        mso-style-link:"HTML Preformatted Char";
        margin:0in;
        margin-bottom:.0001pt;
        font-size:10.0pt;
        font-family:"Courier New";}
p.MsoAcetate, li.MsoAcetate, div.MsoAcetate
        {mso-style-priority:99;
        mso-style-link:"Balloon Text Char";
        margin:0in;
        margin-bottom:.0001pt;
        font-size:8.0pt;
        font-family:"Tahoma","sans-serif";
        mso-fareast-language:EN-US;}
span.HTMLPreformattedChar
        {mso-style-name:"HTML Preformatted Char";
        mso-style-priority:99;
        mso-style-link:"HTML Preformatted";
        font-family:"Courier New";
        mso-fareast-language:IT;}
span.EmailStyle20
        {mso-style-type:personal;
        font-family:"Calibri","sans-serif";
        color:windowtext;}
span.EmailStyle21
        {mso-style-type:personal-reply;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
span.BalloonTextChar
        {mso-style-name:"Balloon Text Char";
        mso-style-priority:99;
        mso-style-link:"Balloon Text";
        font-family:"Tahoma","sans-serif";
        mso-fareast-language:EN-US;}
span.PlainTextChar
        {mso-style-name:"Plain Text Char";
        mso-style-priority:99;
        mso-style-link:"Plain Text";
        font-family:"Calibri","sans-serif";
        mso-fareast-language:EN-US;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=IT link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span lang=EN-US style='color:#1F497D'>Thank you Venkat.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='color:#1F497D'>I tried to find the problem at fabric level, thus I executed the perfquery and ibqueryerrors tools. <o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='color:#1F497D'>Follows their outputs. It seems there are cable connection problems, i.e. LinkDowned = 1.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='color:#1F497D'>I will try to change switch’s sockets.<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='color:#1F497D'><o:p> </o:p></span></p><p class=MsoPlainText><span lang=EN-US>[root@host1 RDSINFO]# perfquery<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US># Port counters: Lid 5 port 1<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US>PortSelect:......................1<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US>CounterSelect:...................0x1400<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US>SymbolErrors:....................0<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US>LinkRecovers:....................0<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US>LinkDowned:......................1<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US>RcvErrors:.......................125<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US>RcvRemotePhysErrors:.............0<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US>RcvSwRelayErrors:................0<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US>XmtDiscards:.....................10<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US>XmtConstraintErrors:.............0<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US>RcvConstraintErrors:.............0<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US>CounterSelect2:..................0x00<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US>LinkIntegrityErrors:.............0<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US>ExcBufOverrunErrors:.............0<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US>VL15Dropped:.....................0<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US>XmtData:.........................4294967295<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US>RcvData:.........................4294967295<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US>XmtPkts:.........................51939068<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US>RcvPkts:.........................75491204<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US>XmtWait:.........................2601824<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US><o:p> </o:p></span></p><p class=MsoPlainText><span lang=EN-US>[root@host1 conf]# ibqueryerrors -r<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US>Suppressing:<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US>Errors for 0x1e8c0000dc93fb "host3 HCA-1"<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US> GUID 0x1e8c0000dc93fb port 1: [XmtDiscards == 1] [XmtWait == 4969894]<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US> Link info: 4 1[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> 0x000b8cffff004944 2 12[ ] "MT47396 Infiniscale-III Mellanox Technologies" ( )<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US>Errors for 0x1fc6000004d976 "host4 HCA-1"<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US> GUID 0x1fc6000004d976 port 1: [XmtDiscards == 1] [XmtWait == 5821946]<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US> Link info: 3 1[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> 0x000b8cffff004944 2 11[ ] "MT47396 Infiniscale-III Mellanox Technologies" ( )<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US>Errors for 0x1fc600000587b6 "host2 HCA-1"<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US> GUID 0x1fc600000587b6 port 1: [LinkDowned == 1] [RcvErrors == 125] [XmtDiscards == 10] [XmtWait == 2601824]<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US> Link info: 5 1[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> 0x000b8cffff004944 2 2[ ] "MT47396 Infiniscale-III Mellanox Technologies" ( )<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US>Errors for 0xb8cffff004944 "MT47396 Infiniscale-III Mellanox Technologies"<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US> GUID 0xb8cffff004944 port ALL: [LinkRecovers == 13] [LinkDowned == 6] [RcvSwRelayErrors == 492] [XmtDiscards == 1]<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US> GUID 0xb8cffff004944 port 1: [RcvSwRelayErrors == 139]<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US> Link info: 2 1[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> 0x001e8c0000dc942f 1 1[ ] "host1 HCA-1" ( )<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US> GUID 0xb8cffff004944 port 2: [LinkRecovers == 13] [LinkDowned == 6] [RcvSwRelayErrors == 155]<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US> Link info: 2 2[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> 0x001fc600000587b6 5 1[ ] "host2 HCA-1" ( )<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US> GUID 0xb8cffff004944 port 11: [RcvSwRelayErrors == 97] [XmtDiscards == 1]<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US> Link info: 2 11[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> 0x001fc6000004d976 3 1[ ] "host4 HCA-1" ( )<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US> GUID 0xb8cffff004944 port 12: [RcvSwRelayErrors == 101]<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US> Link info: 2 12[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> 0x001e8c0000dc93fb 4 1[ ] "host3 HCA-1" ( )<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US>Errors for 0x1e8c0000dc942f "host1 HCA-1"<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US> GUID 0x1e8c0000dc942f port 1: [XmtDiscards == 1] [XmtWait == 6270721]<o:p></o:p></span></p><p class=MsoPlainText><span lang=EN-US> Link info: 1 1[ ] ==( 4X 5.0 Gbps Active/ LinkUp)==> 0x000b8cffff004944 2 1[ ] "MT47396 Infiniscale-III Mellanox Technologies" ( )<o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='color:#1F497D'> <o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='color:#1F497D'><o:p> </o:p></span></p><div><div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in'><p class=MsoNormal><b><span lang=EN-US style='font-size:10.0pt;font-family:"Tahoma","sans-serif";mso-fareast-language:IT'>From:</span></b><span lang=EN-US style='font-size:10.0pt;font-family:"Tahoma","sans-serif";mso-fareast-language:IT'> Venkat Venkatsubra [mailto:venkat.x.venkatsubra@oracle.com] <br><b>Sent:</b> Wednesday, May 11, 2011 2:50 PM<br><b>To:</b> c.dittamo@list-group.com<br><b>Cc:</b> rds-devel@oss.oracle.com<br><b>Subject:</b> Re: [rds-devel] QP error event with RDS<o:p></o:p></span></p></div></div><p class=MsoNormal><o:p> </o:p></p><div><p><span style='color:black'>Hello,<o:p></o:p></span></p><p><span style='color:black'> <o:p></o:p></span></p><p><span style='color:black'>Status 12 is IB_WC_RETRY_EXC_ERR (include/rdma/ib_verbs.h).<o:p></o:p></span></p><p><span style='color:black'>The IB transport layer gave up after retrying to transmit a number of times.<o:p></o:p></span></p><p><span style='color:black'> <o:p></o:p></span></p><p><span style='color:black'>Can you mail me the rds-info output on the sending as well as the receiving side ?<o:p></o:p></span></p><p><span style='color:black'>(a snapshot before and after the run)<o:p></o:p></span></p><p><span style='color:black'> <o:p></o:p></span></p><p><span style='color:black'>Did the receiving side have buffers posted to receive ?<o:p></o:p></span></p><p><span style='color:black'> <o:p></o:p></span></p><p><span style='color:black'>Venkat<o:p></o:p></span></p><p style='margin-bottom:12.0pt'><span style='color:black'><br>----- Original Message -----<br>From: c.dittamo@list-group.com<br>To: rds-devel@oss.oracle.com<br>Sent: Wednesday, May 11, 2011 5:38:42 AM GMT -06:00 US/Canada Central<br>Subject: [rds-devel] QP error event with RDS<o:p></o:p></span></p><div><p class=MsoNormal><span lang=EN-US style='color:black'>Hi, </span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='color:black'>I am hitting the following QP error </span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='font-size:10.0pt;font-family:"Courier New";color:black;mso-fareast-language:IT'>RDS/IB: send completion on address had status 12, disconnecting and reconnecting</span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='color:black'>during my application execution on a Linux RHEL5.5 (kernel 2.6.32.32). </span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='color:black'>My application is a 4 nodes distributed client-server program that leverages the RDS features (i.e. sendmsg and recvmsg) only, i.e. without RDMA. I checked all cables connections and they are fine. All IB (Mellanox) drivers were loaded.</span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='color:black'>Any ideas why RDS returns this error?</span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span lang=EN-US style='color:black'>Thank you.</span><span style='color:black'><o:p></o:p></span></p><p class=MsoNormal><span style='color:black'> <o:p></o:p></span></p><p class=MsoNormal><span style='color:black'> <o:p></o:p></span></p><p class=MsoNormal><span style='color:black'> <o:p></o:p></span></p><p class=MsoNormal><span style='color:black'> <o:p></o:p></span></p><p class=MsoNormal><span style='color:black'> <o:p></o:p></span></p></div></div></div></body></html>