[rds-devel] IB/iWARP code separation

Or Gerlitz or.gerlitz at gmail.com
Tue Nov 4 14:08:02 PST 2008


Richard Frank <richard.frank at oracle.com> wrote:

> An RDS client and depend on:
> 1) rdma completion indicates buffer ownership is returned to client.
> 2) rdma has been processed by remote HCA - and is pending delivery to remote host
> memory.
> 3) rdma completion indicates status of the rdma operation itself - not just that is was
> submitted.

OK, makes sense.

> With IB, the send completion is used to determine the success of the rdma operation.
> With iWARP, the send completion does not provide the same semantics. The completion > only indicates that the operation has been accepted by the local driver / NIC. iWARP
> errors are delivered asynchronously - out of band - raising at least two issues:

Let me see that I am with you: if the completion only indicates that
the operation has been accepted for processing, then what can serve
the rdma initiator as an indication that they can reclaim the buffer?
for example, for sure the iWARP designers wanted it to support
transactional protocols which go like

A  ---- request   ---> B
A <--- rdma      ----  B
A <--- response ---- B

so in that case you claim that B can't send the response unless its
being notified in some magic way that the remote HCA has been done
processing the rdma?! also, if there's an error processing the rdma at
the remote side, the error is delivered not via the completion status
of this rdma, but rather how? by async event on the qp? But these
event are not affiliated with any specific transaction, so how can one
build a correlation scheme?

I'd like to think this is not the case, Steve, Jon can you comment
here? in an earlier thread
http://lists.openfabrics.org/pipermail/general/2008-April/049015.html
on RDS iWARP porting I I see a comment by Steve saying "write and send
completions for  iWARP only indicate the buffer for the IO operation
can be reused". Further, on that thread I asked if any dramatic
changed were needed to port the open-mpi code that uses verbs/rdma-cm
over IB to work also over iWARP
http://lists.openfabrics.org/pipermail/general/2008-April/048897.html
and the answer wasn't that Definitely more or less the same code does
the job. I also suggested to have some short paper in place based on
the experience learned through the open-mpi porting process, I don't
remember that such paper was distributed ever...

> 1) The RDS driver must compensate for these differences by holding off returning a
> completion until the rdma operation is actually processed !
> 2) To force processing of the submitted rdma operation requires submitting a subsequent
> rdma op - perhaps a 0bread (0 byte read )- to flush out the prior rdma - thereby triggering
> potential async error notifications, etc.
> 3) We need to correlate an iWARP async error with the operation that triggered the error - > which has already recv'd an transport ack.


I am quite confident that completion of 0bread which was submitted
following rdma-write can be used to make sure that the write data has
been delivered to the remote host memory and
applies also to IB, so you may apply it as the usual habit, not
specially for iWARP (BTW 0bwrite can be used for the a simple and
robust keep-alive with IB) So the two open issues we have here is the
what can be used to have the local side reclaim its buffer under rdma
wrire and how are "remote rdma errors" delivered. I hope that
Steve/Jon will elaborate on that.

> The issues with fastreg are:

lets take that once the other issues are better understood.

> Another related issue - is that we are planning to move to SWI (send with invalidate)
> operations - which I believe are not supported by iWARP NICs... this will be another piece > of transport independent support.

The converse holds, under iWARP supporting send-with-invalidate is
mandatory but is optional under IB and defined under the "base memory
management extensions" (BMME) section of the spec.

> Adding up all of the above - plus a) no native loop back support for bcopy or rdma with at
> least the current set of iWARP NICs, b) the fact that we are shipping product with RDS on > IB today, c) RDS is designed to support modular transports (ok still a work in progress) - it > is prudent to handle the iWARP specific support in an iWARP specific module.

let me see I understand :  the plan is to maintain the IB transport as
production ready and in parallel develop the iWARP transport as
experimental. I think that in few places you have mentioned that some
API changes would be needed to the RDS consumer app, so not only that
RDS would be very much aware what rdma transport is used, also the app
will? can't this be avoided?

Or.



More information about the rds-devel mailing list