[rds-devel] QP error event with RDMA.
Olaf Kirch
olaf.kirch at oracle.com
Mon Apr 28 12:30:28 PDT 2008
Hi Pradeep,
On Monday 28 April 2008 20:58:48 Pradeep wrote:
> I'm hitting an unhandled QP event with RDMA test:
> rds-stress -r 10.0.0.54 -s 10.0.0.53 -p 4000 -t1 -d4 -D524288 -T20
>
> RDS/ib: unhandled QP event 3 on connection to 10.0.0.54
> RDS/IB: recv completion on 10.0.0.54 had status 5, disconnecting and
> reconnecting
>
> Remote side:
> RDS/IB: send completion on 10.0.0.53 had status 12, disconnecting and
> reconnecting
>
> Any idea why RDS is getting QP event 3 (IB_EVENT_QP_ACCESS_ERR)?
Yes, this usually means the R_Key was not valid (happens if your stack
is buggy, or if the process that gave you the R_Key originally died in
the meantime).
I guess we should probably print a more intelligent message than "unhandled
QP event 3" here...
Olaf
>
> Config:
> OFED:
> http://www.openfabrics.org/downloads/OFED/ofed-1.3-daily/OFED-1.3-20080408-0623.tgz
> Systems: Two Intel(R) Xeon systems.
> OS version: Red Hat Enterprise Linux AS release 4
> 2.6.9-42.0.10.ELsmp #1 SMP Fri Feb 16 17:17:21 EST 2007 i686 i686 i386
> GNU/Linux
> IB Hardware: ConnectX HCA (only one port connected).
>
> RDS config:
> max_send_wr 128
> max_recv_wr 1024
> max_unsignaled_bytes 8388608
> max_unacked_packets 16
>
> [root at iblp0053 tmp]# rds-info
>
> TCP Connections:
> LocalAddr LPort RemoteAddr RPort HdrRemain DataRemain SentNx
> t ExpectUna SeenUna
>
> Counters:
> CounterName Value
> conn_reset 4
> recv_drop_bad_checksum 0
> recv_drop_old_seq 0
> recv_drop_no_sock 2
> recv_drop_dead_sock 0
> recv_deliver_raced 0
> recv_delivered 38479
> recv_queued 38481
> recv_immediate_retry 0
> recv_delayed_retry 0
> recv_ack_required 9478
> recv_rdma_bytes 747110400
> send_queue_empty 121
> send_queue_full 0
> send_sem_contention 0
> send_sem_queue_raced 0
> send_immediate_retry 0
> send_delayed_retry 0
> send_drop_acked 0
> send_ack_required 9443
> send_rdma 19234
> send_rdma_bytes 747110400
> page_remainder_hit 32069
> page_remainder_miss 6412
> cong_update_queued 0
> cong_update_received 1
> cong_send_error 0
> cong_send_blocked 0
> ib_connect_raced 3
> ib_listen_closed_stale 0
> ib_tx_cq_call 52082
> ib_tx_cq_event 67005
> ib_tx_ring_full 0
> ib_tx_sg_mapping_failure 0
> ib_tx_stalled 0
> ib_rx_cq_call 38837
> ib_rx_cq_event 47764
> ib_rx_ring_empty 0
> ib_rx_refill_from_cq 38835
> ib_rx_refill_from_thread 2
> ib_rx_alloc_limit 0
> ib_ack_sent 9287
> ib_ack_send_failure 0
> ib_ack_send_delayed 389
> ib_ack_received 9279
> ib_rdma_mr_alloc 207
> ib_rdma_mr_free 0
> ib_rdma_mr_used 19237
> ib_rdma_mr_pool_flush 186
> ib_rdma_mr_pool_wait 0
> tcp_data_ready_calls 0
> tcp_write_space_calls 0
> tcp_sndbuf_full 0
> tcp_connect_raced 0
> tcp_listen_closed_stale 0
>
> RDS Sockets:
> BoundAddr BPort ConnAddr CPort SndBuf RcvBuf
> 0.0.0.0 0 0.0.0.0 0 55296 55296
>
> RDS Connections:
> LocalAddr RemoteAddr NextTX NextRX Flg
> 10.0.0.53 10.0.0.54 38484 38484 --C
>
> Receive Message Queue:
> LocalAddr LPort RemoteAddr RPort Seq Bytes
>
> Send Message Queue:
> LocalAddr LPort RemoteAddr RPort Seq Bytes
>
> Retransmit Message Queue:
> LocalAddr LPort RemoteAddr RPort Seq Bytes
>
> [root at iblp0054 ~]# rds-info
>
> TCP Connections:
> LocalAddr LPort RemoteAddr RPort HdrRemain DataRemain
> SentNxt ExpectUna SeenUna
>
> Counters:
> CounterName Value
> conn_reset 11
> recv_drop_bad_checksum 0
> recv_drop_old_seq 0
> recv_drop_no_sock 0
> recv_drop_dead_sock 0
> recv_deliver_raced 0
> recv_delivered 38483
> recv_queued 38483
> recv_immediate_retry 0
> recv_delayed_retry 0
> recv_ack_required 9443
> recv_rdma_bytes 747634688
> send_queue_empty 120
> send_queue_full 0
> send_sem_contention 0
> send_sem_queue_raced 0
> send_immediate_retry 0
> send_delayed_retry 0
> send_drop_acked 0
> send_ack_required 9479
> send_rdma 19237
> send_rdma_bytes 748158976
> page_remainder_hit 32072
> page_remainder_miss 6413
> cong_update_queued 0
> cong_update_received 2
> cong_send_error 0
> cong_send_blocked 0
> ib_connect_raced 2
> ib_listen_closed_stale 0
> ib_tx_cq_call 50886
> ib_tx_cq_event 67011
> ib_tx_ring_full 0
> ib_tx_sg_mapping_failure 0
> ib_tx_stalled 14
> ib_rx_cq_call 38579
> ib_rx_cq_event 47978
> ib_rx_ring_empty 2
> ib_rx_refill_from_cq 38579
> ib_rx_refill_from_thread 2
> ib_rx_alloc_limit 0
> ib_ack_sent 9279
> ib_ack_send_failure 0
> ib_ack_send_delayed 542
> ib_ack_received 9287
> ib_rdma_mr_alloc 207
> ib_rdma_mr_free 0
> ib_rdma_mr_used 19238
> ib_rdma_mr_pool_flush 188
> ib_rdma_mr_pool_wait 0
> tcp_data_ready_calls 0
> tcp_write_space_calls 0
> tcp_sndbuf_full 0
> tcp_connect_raced 0
> tcp_listen_closed_stale 0
>
> RDS Sockets:
> BoundAddr BPort ConnAddr CPort SndBuf RcvBuf
> 0.0.0.0 0 0.0.0.0 0 55296 55296
>
> RDS Connections:
> LocalAddr RemoteAddr NextTX NextRX Flg
> 10.0.0.54 10.0.0.53 38488 38484 --C
>
> Receive Message Queue:
> LocalAddr LPort RemoteAddr RPort Seq Bytes
>
> Send Message Queue:
> LocalAddr LPort RemoteAddr RPort Seq Bytes
>
> Retransmit Message Queue:
> LocalAddr LPort RemoteAddr RPort Seq Bytes
>
>
> Thanks,
> Pradeep
>
> _______________________________________________
> rds-devel mailing list
> rds-devel at oss.oracle.com
> http://oss.oracle.com/mailman/listinfo/rds-devel
>
--
Olaf Kirch | --- o --- Nous sommes du soleil we love when we play
okir at lst.de | / | \ sol.dhoop.naytheet.ah kin.ir.samse.qurax
More information about the rds-devel
mailing list