[rds-devel] RDS hanging in send queue

Andy Grover andy.grover at oracle.com
Wed May 27 17:21:55 PDT 2009


Hi Mike,

-22 is EINVAL. Hmmm.

Do you have the time to help me diagnose this? I can't reproduce it.

in net/rds/ib_cm.c rds_ib_cm_handle_connect():

...
err = rdma_accept(cm_id, &conn_param);
...

Either the cm_id is bad (doubtful, since it was passed in to us(?)) or
something in the conn_param struct is bad?

Sean any ideas?

Thanks -- Regards -- Andy

Mike Heinz wrote:
> Hey, all -
> 
> Got a report from one of our testers that rds-ping was failing
> between two machines. When I went to check them out, I found that
> things are piling up in the send queue (see below) and
> /var/log/messages was flooded with thousands of copies of the error:
> 
> May 27 10:23:18 st2031 kernel: RDS/IB: rdma_accept failed (-22)
> 
> Restarting the rds module on both machines has no effect. Having the
> machines ping themselves has no effect.
> 
> Any suggestions? Below this line is a trimmed copy of rds-info from
> the same machine:
> 
> ---------------------------------------------------------------------------
> 
> 
> RDS IB Connections: LocalAddr      RemoteAddr
> LocalDev         RemoteDev 172.26.137.51   172.26.137.49
> ::                ::
> 
> Counters: CounterName            Value conn_reset            20727 
> (trimmed lines where value was zero) send_queue_empty
> 201 (trimmed lines where value was zero) send_queued             6957
>  (trimmed lines where value was zero) ib_connect_raced
> 45 (trimmed lines where value was zero) ib_rdma_mr_pool_flush
> 40
> 
> RDS Sockets: BoundAddr BPort        ConnAddr CPort     SndBuf
> RcvBuf    Inode 172.26.137.51    74         0.0.0.0     0    8388608
> 8388608    16870 172.26.137.51 51741         0.0.0.0     0    8388608
> 8388608    16872 172.26.137.51 29215         0.0.0.0     0    8388608
> 8388608    16873 172.26.137.51 19359         0.0.0.0     0    8388608
> 8388608    16874 172.26.137.51  5841         0.0.0.0     0    8388608
> 8388608    16875 172.26.137.51 13520         0.0.0.0     0    8388608
> 8388608    16876 172.26.137.51 43209         0.0.0.0     0    8388608
> 8388608    16877 172.26.137.51 46564         0.0.0.0     0    8388608
> 8388608    16878 0.0.0.0     0         0.0.0.0     0    8388608
> 8388608    22533
> 
> RDS Connections: LocalAddr      RemoteAddr           NextTX
> NextRX Flg 172.26.137.51   172.26.137.49             6958
> 0 ---
> 
> Receive Message Queue: LocalAddr LPort      RemoteAddr RPort
> Seq      Bytes
> 
> Send Message Queue: LocalAddr LPort      RemoteAddr RPort
> Seq      Bytes 172.26.137.51    74   172.26.137.49     0
> 146          0 172.26.137.51 51741   172.26.137.49     0
> 147          0 172.26.137.51 29215   172.26.137.49     0
> 148          0 172.26.137.51 19359   172.26.137.49     0
> 149          0 172.26.137.51  5841   172.26.137.49     0
> 150          0 172.26.137.51 13520   172.26.137.49     0
> 151          0 172.26.137.51 43209   172.26.137.49     0
> 152          0 172.26.137.51 46564   172.26.137.49     0
> 153          0 172.26.137.51    74   172.26.137.49     0
> 154          0 172.26.137.51 51741   172.26.137.49     0
> 155          0 172.26.137.51 29215   172.26.137.49     0
> 156          0 172.26.137.51 19359   172.26.137.49     0
> 157          0 172.26.137.51  5841   172.26.137.49     0
> 158          0 172.26.137.51 13520   172.26.137.49     0
> 159          0 172.26.137.51 43209   172.26.137.49     0
> 160          0 172.26.137.51 46564   172.26.137.49     0
> 161          0 . . . . (trimmed remaining records, they only repeat
> the pattern shown above) . . . Retransmit Message Queue: LocalAddr
> LPort      RemoteAddr RPort              Seq      Bytes
> 
> 
> 
> -- Michael Heinz Principal Engineer, Qlogic Corporation King of
> Prussia, Pennsylvania
> 
> 
> _______________________________________________ rds-devel mailing
> list rds-devel at oss.oracle.com 
> http://oss.oracle.com/mailman/listinfo/rds-devel




More information about the rds-devel mailing list