[rds-devel] Re: Looking at a proposal from folks to have use_once memory keys.

Richard Frank richard.frank at oracle.com
Fri Jan 4 06:11:58 PST 2008


 >>
So, one could possibly make a case for ignoring the RDMA op
on a retransmit.
<<

Yes - by design we choose to drop RDMA ops - they are not reliable - and 
both the rdma op and the immediate data rm must be dropped together - 
they are an atomic unit.

We did this specifically to simplify dealing with HCA failover - which 
is a rare event..

If the client app sends in a bad key - this may break the connection as 
you've outlined - but the bad rdma will get dropped and the connection 
will reform.

 >>
BTW: Can you explain what *exactly* goes wrong with HCA failover,
that makes us do all this ACK/retransmit business? Up to which point
can a WC get lost on the receiving host? And how does that relate
to RDMA ops?
<<

Here's my limited understanding...

The HCA issues the hardware ack (put on the wire) for a send it recv'd 
before 1) the send data (could be rdma) is pushed to host memory and 2) 
the WC is queued. So it is possible for the sending side HCA to get back 
an IB ack - and que a local completion for the send (or RDMA). At this 
point the remote HCA fails and the neither the data nor WC make it to 
remote host memory. However, the  local host thinks that it did due to 
the ACK and completion generated - so we lose the data.




More information about the rds-devel mailing list