[rds-devel] Re: Looking at a proposal from folks to have use_once
memory keys.
Richard Frank
richard.frank at oracle.com
Fri Jan 4 06:11:58 PST 2008
>>
So, one could possibly make a case for ignoring the RDMA op
on a retransmit.
<<
Yes - by design we choose to drop RDMA ops - they are not reliable - and
both the rdma op and the immediate data rm must be dropped together -
they are an atomic unit.
We did this specifically to simplify dealing with HCA failover - which
is a rare event..
If the client app sends in a bad key - this may break the connection as
you've outlined - but the bad rdma will get dropped and the connection
will reform.
>>
BTW: Can you explain what *exactly* goes wrong with HCA failover,
that makes us do all this ACK/retransmit business? Up to which point
can a WC get lost on the receiving host? And how does that relate
to RDMA ops?
<<
Here's my limited understanding...
The HCA issues the hardware ack (put on the wire) for a send it recv'd
before 1) the send data (could be rdma) is pushed to host memory and 2)
the WC is queued. So it is possible for the sending side HCA to get back
an IB ack - and que a local completion for the send (or RDMA). At this
point the remote HCA fails and the neither the data nor WC make it to
remote host memory. However, the local host thinks that it did due to
the ACK and completion generated - so we lose the data.
More information about the rds-devel
mailing list