[rds-devel] Re: RDS IB transport software flow control?
Or Gerlitz
ogerlitz at voltaire.com
Wed Nov 7 06:33:11 PST 2007
Richard Frank wrote:
> Just to be clear - RDS does not explicitly ack individual socket sends -
> it periodically (when requested) sends back an RC level high water mark
> ack indicating all sends that have arrived over an RC. The reason for
> the ack is to enable replay of sends when paths fail. In practice we
> issue the ack to 1) free send side resources 2) reduce replay in light
> of a path failure. The ack is requested by the send side when we cross
> a threshold of send side resource consumption.
Looking on the code (v3, ofed 1.3) I see the following comment:
> * When the remote host receives our ack they'll free the sent message from
> * their send queue. To decrease the latency of this we always send an ack
> * immediately after we've received messages.
> *
> * For simplicity, we only have one ack in flight at a time. This puts
> * pressure on senders to have deep enough send queues to absorb the latency of
> * a single ack frame being in flight. This might not be good enough.
which says that "ack is sent immediatly after receiving messages" ?
> Our design principle for RDS has been to keep it simple - a minimalist
> approach - the idea being - that less code is good from a
> maintainability perspective.
Still, I am not sure that a credit management protocol from which you
can deduce what messages are acked (as of the in-order property of IB
RC) would be not more complex then this acking, say LOC wise.
> We do leverage IB hardware flow control for RDS - which seems to work
> well 1) in practice (real application load) we do not see RNRs - and
> therefore RNRs are not swamping the network. 2) When RNRs do occur (test
> driven loads) the hardware flow control coupled with the driver
> reposting of recv buffers is very efficient .
AFAIK, iWARP does not have HW flow control, so relying on RNR NAKs
narrows the scope of RDS to IB only.
> A couple of additional optimizations for our existing flow control would
> be to 1) add srq support - this will reduce the possibility of RNRs 2)
> use an rdma write - vs - message for the ack to remove the requirement
> of having a recv buffer posted to handle the ack.
Can you elaborate a little on RDMA ACKs vs SEND ACKs? I guess you don't
mean to rdma-write-with-immediate since this also consumes a WR at the
receiver side. If it just rdma-write, how would the sender be notified
on the ACK reception, would it do polling-on-memory?
> Perhaps we will need recv side flow control - if/when we find that the
> IB hardware flow control becomes an issue - maybe that's just around the
> corner. We'd like to see some data showing this is a real problem.
My feeling is that this scenario is waiting for you around the corner as
you say, but I can't prove it as this point of my knowledge and hands on
RDS, we will see.
Or.
More information about the rds-devel
mailing list