[rds-devel] IB/iWARP code separation

Steve Wise swise at opengridcomputing.com
Thu Nov 6 11:01:49 PST 2008


Or Gerlitz wrote:
> Richard Frank <richard.frank at oracle.com> wrote:
>
>   
>> An RDS client and depend on:
>> 1) rdma completion indicates buffer ownership is returned to client.
>> 2) rdma has been processed by remote HCA - and is pending delivery to remote host
>> memory.
>> 3) rdma completion indicates status of the rdma operation itself - not just that is was
>> submitted.
>>     
>
> OK, makes sense.
>
>   
>> With IB, the send completion is used to determine the success of the rdma operation.
>> With iWARP, the send completion does not provide the same semantics. The completion > only indicates that the operation has been accepted by the local driver / NIC. iWARP
>> errors are delivered asynchronously - out of band - raising at least two issues:
>>     
>
> Let me see that I am with you: if the completion only indicates that
> the operation has been accepted for processing, then what can serve
> the rdma initiator as an indication that they can reclaim the buffer?
>   

Completion of any WR means the associated buffers can be reclaimed.

> for example, for sure the iWARP designers wanted it to support
> transactional protocols which go like
>
> A  ---- request   ---> B
> A <--- rdma      ----  B
> A <--- response ---- B
>
> so in that case you claim that B can't send the response unless its
> being notified in some magic way that the remote HCA has been done
> processing the rdma?! also, if there's an error processing the rdma at
> the remote side, the error is delivered not via the completion status
> of this rdma, but rather how? by async event on the qp? But these
> event are not affiliated with any specific transaction, so how can one
> build a correlation scheme?
>
> I'd like to think this is not the case, Steve, Jon can you comment
> here? in an earlier thread
> http://lists.openfabrics.org/pipermail/general/2008-April/049015.html
> on RDS iWARP porting I I see a comment by Steve saying "write and send
> completions for  iWARP only indicate the buffer for the IO operation
> can be reused". Further, on that thread I asked if any dramatic
> changed were needed to port the open-mpi code that uses verbs/rdma-cm
> over IB to work also over iWARP
> http://lists.openfabrics.org/pipermail/general/2008-April/048897.html
> and the answer wasn't that Definitely more or less the same code does
> the job. I also suggested to have some short paper in place based on
> the experience learned through the open-mpi porting process, I don't
> remember that such paper was distributed ever...
>
>   
>> 1) The RDS driver must compensate for these differences by holding off returning a
>> completion until the rdma operation is actually processed !
>> 2) To force processing of the submitted rdma operation requires submitting a subsequent
>> rdma op - perhaps a 0bread (0 byte read )- to flush out the prior rdma - thereby triggering
>> potential async error notifications, etc.
>> 3) We need to correlate an iWARP async error with the operation that triggered the error - > which has already recv'd an transport ack.
>>     
>
>
> I am quite confident that completion of 0bread which was submitted
> following rdma-write can be used to make sure that the write data has
> been delivered to the remote host memory and
> applies also to IB, so you may apply it as the usual habit, not
> specially for iWARP (BTW 0bwrite can be used for the a simple and
> robust keep-alive with IB) So the two open issues we have here is the
> what can be used to have the local side reclaim its buffer under rdma
> wrire and how are "remote rdma errors" delivered. I hope that
> Steve/Jon will elaborate on that.
>
>   
>> The issues with fastreg are:
>>     
>
> lets take that once the other issues are better understood.
>
>   
>> Another related issue - is that we are planning to move to SWI (send with invalidate)
>> operations - which I believe are not supported by iWARP NICs... this will be another piece > of transport independent support.
>>     
>
> The converse holds, under iWARP supporting send-with-invalidate is
> mandatory but is optional under IB and defined under the "base memory
> management extensions" (BMME) section of the spec.
>
>   
>> Adding up all of the above - plus a) no native loop back support for bcopy or rdma with at
>> least the current set of iWARP NICs, b) the fact that we are shipping product with RDS on > IB today, c) RDS is designed to support modular transports (ok still a work in progress) - it > is prudent to handle the iWARP specific support in an iWARP specific module.
>>     
>
> let me see I understand :  the plan is to maintain the IB transport as
> production ready and in parallel develop the iWARP transport as
> experimental. I think that in few places you have mentioned that some
> API changes would be needed to the RDS consumer app, so not only that
> RDS would be very much aware what rdma transport is used, also the app
> will? can't this be avoided?
>
> Or.
>   




More information about the rds-devel mailing list