[rds-devel] RDS hanging in send queue
Andy Grover
andy.grover at oracle.com
Wed May 27 17:21:55 PDT 2009
Hi Mike,
-22 is EINVAL. Hmmm.
Do you have the time to help me diagnose this? I can't reproduce it.
in net/rds/ib_cm.c rds_ib_cm_handle_connect():
...
err = rdma_accept(cm_id, &conn_param);
...
Either the cm_id is bad (doubtful, since it was passed in to us(?)) or
something in the conn_param struct is bad?
Sean any ideas?
Thanks -- Regards -- Andy
Mike Heinz wrote:
> Hey, all -
>
> Got a report from one of our testers that rds-ping was failing
> between two machines. When I went to check them out, I found that
> things are piling up in the send queue (see below) and
> /var/log/messages was flooded with thousands of copies of the error:
>
> May 27 10:23:18 st2031 kernel: RDS/IB: rdma_accept failed (-22)
>
> Restarting the rds module on both machines has no effect. Having the
> machines ping themselves has no effect.
>
> Any suggestions? Below this line is a trimmed copy of rds-info from
> the same machine:
>
> ---------------------------------------------------------------------------
>
>
> RDS IB Connections: LocalAddr RemoteAddr
> LocalDev RemoteDev 172.26.137.51 172.26.137.49
> :: ::
>
> Counters: CounterName Value conn_reset 20727
> (trimmed lines where value was zero) send_queue_empty
> 201 (trimmed lines where value was zero) send_queued 6957
> (trimmed lines where value was zero) ib_connect_raced
> 45 (trimmed lines where value was zero) ib_rdma_mr_pool_flush
> 40
>
> RDS Sockets: BoundAddr BPort ConnAddr CPort SndBuf
> RcvBuf Inode 172.26.137.51 74 0.0.0.0 0 8388608
> 8388608 16870 172.26.137.51 51741 0.0.0.0 0 8388608
> 8388608 16872 172.26.137.51 29215 0.0.0.0 0 8388608
> 8388608 16873 172.26.137.51 19359 0.0.0.0 0 8388608
> 8388608 16874 172.26.137.51 5841 0.0.0.0 0 8388608
> 8388608 16875 172.26.137.51 13520 0.0.0.0 0 8388608
> 8388608 16876 172.26.137.51 43209 0.0.0.0 0 8388608
> 8388608 16877 172.26.137.51 46564 0.0.0.0 0 8388608
> 8388608 16878 0.0.0.0 0 0.0.0.0 0 8388608
> 8388608 22533
>
> RDS Connections: LocalAddr RemoteAddr NextTX
> NextRX Flg 172.26.137.51 172.26.137.49 6958
> 0 ---
>
> Receive Message Queue: LocalAddr LPort RemoteAddr RPort
> Seq Bytes
>
> Send Message Queue: LocalAddr LPort RemoteAddr RPort
> Seq Bytes 172.26.137.51 74 172.26.137.49 0
> 146 0 172.26.137.51 51741 172.26.137.49 0
> 147 0 172.26.137.51 29215 172.26.137.49 0
> 148 0 172.26.137.51 19359 172.26.137.49 0
> 149 0 172.26.137.51 5841 172.26.137.49 0
> 150 0 172.26.137.51 13520 172.26.137.49 0
> 151 0 172.26.137.51 43209 172.26.137.49 0
> 152 0 172.26.137.51 46564 172.26.137.49 0
> 153 0 172.26.137.51 74 172.26.137.49 0
> 154 0 172.26.137.51 51741 172.26.137.49 0
> 155 0 172.26.137.51 29215 172.26.137.49 0
> 156 0 172.26.137.51 19359 172.26.137.49 0
> 157 0 172.26.137.51 5841 172.26.137.49 0
> 158 0 172.26.137.51 13520 172.26.137.49 0
> 159 0 172.26.137.51 43209 172.26.137.49 0
> 160 0 172.26.137.51 46564 172.26.137.49 0
> 161 0 . . . . (trimmed remaining records, they only repeat
> the pattern shown above) . . . Retransmit Message Queue: LocalAddr
> LPort RemoteAddr RPort Seq Bytes
>
>
>
> -- Michael Heinz Principal Engineer, Qlogic Corporation King of
> Prussia, Pennsylvania
>
>
> _______________________________________________ rds-devel mailing
> list rds-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/rds-devel
More information about the rds-devel
mailing list