[rds-devel] RDS hanging in send queue

Mike Heinz michael.heinz at qlogic.com
Wed May 27 10:07:56 PDT 2009


Hey, all - 

Got a report from one of our testers that rds-ping was failing between two machines. When I went to check them out, I found that things are piling up in the send queue (see below) and /var/log/messages was flooded with thousands of copies of the error:

May 27 10:23:18 st2031 kernel: RDS/IB: rdma_accept failed (-22)

Restarting the rds module on both machines has no effect. Having the machines ping themselves has no effect.

Any suggestions? Below this line is a trimmed copy of rds-info from the same machine:

---------------------------------------------------------------------------

RDS IB Connections:
      LocalAddr      RemoteAddr                         LocalDev         RemoteDev
  172.26.137.51   172.26.137.49                               ::                ::

Counters:
              CounterName            Value
               conn_reset            20727
(trimmed lines where value was zero)
         send_queue_empty              201
(trimmed lines where value was zero)
              send_queued             6957
(trimmed lines where value was zero)
         ib_connect_raced               45
(trimmed lines where value was zero)
    ib_rdma_mr_pool_flush               40

RDS Sockets:
      BoundAddr BPort        ConnAddr CPort     SndBuf     RcvBuf    Inode
  172.26.137.51    74         0.0.0.0     0    8388608    8388608    16870
  172.26.137.51 51741         0.0.0.0     0    8388608    8388608    16872
  172.26.137.51 29215         0.0.0.0     0    8388608    8388608    16873
  172.26.137.51 19359         0.0.0.0     0    8388608    8388608    16874
  172.26.137.51  5841         0.0.0.0     0    8388608    8388608    16875
  172.26.137.51 13520         0.0.0.0     0    8388608    8388608    16876
  172.26.137.51 43209         0.0.0.0     0    8388608    8388608    16877
  172.26.137.51 46564         0.0.0.0     0    8388608    8388608    16878
        0.0.0.0     0         0.0.0.0     0    8388608    8388608    22533

RDS Connections:
      LocalAddr      RemoteAddr           NextTX           NextRX Flg
  172.26.137.51   172.26.137.49             6958                0 ---

Receive Message Queue:
      LocalAddr LPort      RemoteAddr RPort              Seq      Bytes

Send Message Queue:
      LocalAddr LPort      RemoteAddr RPort              Seq      Bytes
  172.26.137.51    74   172.26.137.49     0              146          0
  172.26.137.51 51741   172.26.137.49     0              147          0
  172.26.137.51 29215   172.26.137.49     0              148          0
  172.26.137.51 19359   172.26.137.49     0              149          0
  172.26.137.51  5841   172.26.137.49     0              150          0
  172.26.137.51 13520   172.26.137.49     0              151          0
  172.26.137.51 43209   172.26.137.49     0              152          0
  172.26.137.51 46564   172.26.137.49     0              153          0
  172.26.137.51    74   172.26.137.49     0              154          0
  172.26.137.51 51741   172.26.137.49     0              155          0
  172.26.137.51 29215   172.26.137.49     0              156          0
  172.26.137.51 19359   172.26.137.49     0              157          0
  172.26.137.51  5841   172.26.137.49     0              158          0
  172.26.137.51 13520   172.26.137.49     0              159          0
  172.26.137.51 43209   172.26.137.49     0              160          0
  172.26.137.51 46564   172.26.137.49     0              161          0
.
.
.
.
(trimmed remaining records, they only repeat the pattern shown above)
.
.
.
Retransmit Message Queue:
      LocalAddr LPort      RemoteAddr RPort              Seq      Bytes



--
Michael Heinz
Principal Engineer, Qlogic Corporation
King of Prussia, Pennsylvania




More information about the rds-devel mailing list